Resolved
AWS has marked this incident as resolved.
Monitoring
GitHub Actions jobs and container builds are processing normally. We will continue to monitor the open AWS incident to ensure this does not regress.
Monitoring
We have started routing GitHub Actions jobs back to primary infrastructure in us-east-1. Container builds and Actions jobs continue to process successfully.
Monitoring
AWS is starting to see some recovery of their control plane, and similarly we are starting to see recovery of our ability to manage EC2 instances. Some jobs are able to process again, though disruptions are still expected.
Monitoring
Unfortunately the outage at AWS has regressed and they are experiencing issues with their control plane networking. This is affecting our control plane's ability to manage EC2 instances, preventing jobs from starting.
Monitoring
We have started failing over GitHub Actions jobs into us-east-2. There is severely limited capacity in this region so you may still face queue times and longer job start. We continue to monitor us-east-1 network recovery.
Monitoring
The AWS incident is still ongoing, and Actions jobs and container builds continue to be unable to start in us-east-1.
We have routed GitHub Actions jobs to backup infrastructure in us-east-2, and some jobs are able to process, though queue performance is degraded and long queue times or failures may still occur as portions of AWS's outage are affecting cross-zone services.
For container builds, you may be able to unblock builds by using a project located in EU Central.
Monitoring
The us-east-1 outage has been upgraded to a full degraded outage again as there is multiple services now facing cascading API errors. We are continuing to monitor the situation and assess what we can do from our side.
For container builds, it's worth trying to change your projects region to the EU region as you may have some instances that are able to launch. From what we can see, there are API errors going across regions that may make even other regions less stable.
Monitoring
The AWS outage is ongoing, all requests to run new EC2 instances or start existing stopped instances in us-east-1 are failing. This is preventing all GitHub Actions jobs and container builds from starting.
Monitoring
We are continuing to see issues with EC2 in us-east-1 failing to launch instances in various AZs. We are attempting to do our own workarounds while waiting for full service to be restored.
Monitoring
GitHub Actions jobs continue to be processed out of the backlog slower than we'd like because of a blanket rate limit EC2 has applied while they try to restore full service.
Monitoring
Our Dashboard, API, and container builds have fully recovered. We are now monitoring the queue of backlog GitHub Actions jobs as we await AWS processing their full Lambda & Cloudtrail event stream. You may still see queue times for GitHub Actions while that catch-up is happening.
Monitoring
We are continuing to investigate and monitor for full Dynamo recovery to process the backlog of work that is now stuck in a queueing state.
Monitoring
We are seeing improvements across error rates and are now working to process any backlog of container builds and GHA jobs. Certain architectures like ARM appear to be recovering faster than x86.
Monitoring
We are starting to see partial recovery across a variety of core AWS services, but are still seeing failures in other services that are impacting builds and jobs from launching. We will provide an update when we have more information.
Investigating
We are continuing to monitor issues with us-east-1 core services such as EC2 and IAM that are impacting our services. In addition, we are monitoring widespread errors in our upstream auth and database providers as a result of this outage.
Investigating
We are seeing elevated error rates from upstream AWS errors across the majority of our services. We will share an update as soon as possible.
Investigating
We are currently investigating an elevated rate of upstream API errors from our cloud provider