Depot - Delays with Github Actions Jobs and Docker Builds – Incident details

Delays with Github Actions Jobs and Docker Builds

Resolved
Major outage
Started about 19 hours agoLasted about 4 hours

Affected

Container builds

Degraded performance from 5:38 PM to 6:27 PM, Major outage from 6:27 PM to 8:36 PM, Degraded performance from 8:36 PM to 9:26 PM

us-east-1

Degraded performance from 5:38 PM to 6:27 PM, Major outage from 6:27 PM to 8:36 PM, Degraded performance from 8:36 PM to 9:26 PM

eu-central-1

Degraded performance from 5:38 PM to 6:27 PM, Major outage from 6:27 PM to 8:36 PM, Degraded performance from 8:36 PM to 9:26 PM

GitHub Actions

Degraded performance from 5:38 PM to 6:27 PM, Major outage from 6:27 PM to 8:36 PM, Degraded performance from 8:36 PM to 9:26 PM

Depot-managed Actions Runners

Degraded performance from 5:38 PM to 6:27 PM, Major outage from 6:27 PM to 8:36 PM, Degraded performance from 8:36 PM to 9:26 PM

Github.com - Actions

Updates
  • Resolved
    Resolved

    We have brought container builds and GitHub Actions back to normal capacity. We appreciate your patience and will have a full post-incident on our blog in the coming days.

    We will continue to monitor for anything that may come up.

  • Update
    Update

    We're seeing queue times beginning to return to normal for GitHub Actions.

    Some container builds are failing to launch. We're investigating these, but in the meantime, you can try resetting your cache to clear out the bad state.

  • Update
    Update

    We have brought the system back to normal capacity, and backlogs are caught up for primary regions. Additional regions are being brought back up to normal capacity at the moment.

    We will continue to monitor recovery.

  • Update
    Update

    We've added an additional workaround for an AWS CPU issue that is currently backing up the system under load. We're working to bring the system back to a normal state where the workaround can be removed so normal processing can restart.

  • Update
    Update

    We continue to work through bringing the system back to full capacity as we are processing a large backlog of work. You will see your jobs are still queued while we build out slack in the system to bring things back to full capacity.

  • Monitoring
    Monitoring

    We have rolled out an initial fix and are seeing initial recovery for container builds and github actions.

  • Identified
    Identified

    We have identified the root cause of the problem and are rolling out a fix across Depot now.

  • Investigating
    Investigating
    We are currently investigating this incident.