GitHub Actions on ECS
June 7, 2026Until recently, hosting GitHub Actions runners on AWS ECS had a serious limitation: no Docker support for Fargate. Sure, you could try running your own EC2 Docker-in-Docker setup with an ASG, but you end up giving root access to the host instance. Another issue is cold starts can take 5+ minutes. Warm-pooling addresses this issue, but it would also require the coordination of multiple other services or setting up a daemon to make sure the ECS agent is restarted on every scale out from the stopped state to ensure your instances register with the cluster. At this point you may be tempted to reach for EKS. But do we really need to introduce all the complexities of Kubernetes just for regular builds? So how do we achieve consistent (Docker support), safe (no root access), cost-effective (scale from zero), fast (no long cold starts), and easy to manage (no Kubernetes) self-hosted builds? ECS Managed Instances.
I see ECS Managed Instances as a nice blend of EC2 compute selection and Fargate scaling.[1] You have control over what runs with compute configurations while AWS manages how it runs by taking care of provisioning, scaling, patching, etc. Root access is also locked down and managed by AWS.[2] I've drawn a simplified diagram of the setup below, highlighting the networking and the main interactions. You can find the full Terraform deployment here.
Note, the ECS Task (Runner) refers to the GitHub Actions self-hosted runner that runs the build steps defined in your yaml workflow files. The task runs on Fargate by default if there isn't a "docker" label in the webhook event; otherwise, it runs with ECS Managed Instances. From there the runner communicates with GitHub, routing through the NAT Gateway and IGW, updating the UI with the status of your builds. I did encounter a small hurdle when setting this up. At times there would be runners that wouldn't pick up jobs and just sit idle. After further investigation, I discovered a thread where others were experiencing the same problem, and the thread revealed a bug where jobs with duplicate labels wouldn't be run.[3] To avoid this, I inject a random hash label within Lambda to pass directly to the runner command, ensuring every queued event results in a running job.
I've secured the deployment at multiple levels. As with any CI/CD pipeline, security is especially important here, since just one compromise could disrupt every system downstream.
- Defined a separate private mirror repo to do builds
- Used fine-grained token permissions for GitHub access
- Scoped GitHub app permissions to least privilege
- Restricted IAM roles to least privilege
- Verified webhook event headers
- Verified runner executable hash
- Configured runners as ephemeral
- Used Managed Instances with secure root access
These measures combined with the serverless cost structure make for a robust solution that can still scale. Of course there is always room for improvement. I could have further secured the API Gateway with AWS WAF to prevent DoS attacks or used Lambda filters to filter out all non-queue events before invocation so I don't pay for no-op runs or support Fargate Spot tasks for extra savings. Further security could also involve using Secrets Manager instead of SSM parameter store for automatic rotation of credentials. An idle timeout check on the self-hosted runners would be useful since ephemeral runners still stick around for 24 hours, so costs can add up if there is a temporary GitHub outage and the runners become disconnected from their jobs. Each of those updates have their tradeoffs, but overall, Managed Instances have unlocked a solution to the Docker on EC2 problem without compromising security and is a nice addition to the existing feature set in ECS.
References
- Amazon ECS Managed Instances — AWS Documentation
- Amazon ECS Managed Instances security model — AWS Documentation
- Jobs with duplicate labels aren't picked up by runners — GitHub Community Discussion