Resolved
This incident has been resolved.
Monitoring
Queue times for GitHub Actions jobs have returned to normal. API failures affecting container builds and the container registry have also stopped. We will continue to monitor for a fully recovery.
Monitoring
AWS have confirmed an outage of the use1-az2 zone - we have determined that this has been affecting network traffic to our API service, causing API requests that were randomly routed through Amazon's load balancer in that zone to fail to reach the API service. We have routed API traffic out of the affected zone and are seeing a recovery of API request errors.
Identified
We continue to see increased queue times for GitHub Actions jobs, though many jobs are able to process.
We also continue to see an increase in responses from our API and are working to resolve.
Monitoring
We have increased capacity for Actions runner jobs to process the backlog.
We have also observed some API request failures when requesting to start a container build, we have deployed additional API capacity and expect retrying the failed build requests to succeed.
Monitoring
We continue to see elevated queue times for depot-ubuntu-24.04 jobs.
Monitoring
We are still observing increased queue times for depot-ubuntu-24.04 jobs as the backlog of incoming jobs is processed.
Monitoring
AWS is experiencing an incident in a single availability zone, we have routed all GitHub Actions jobs away from the affected zone and are monitoring.