Errors loading CircleCI
Incident Report for CircleCI
Postmortem

**Summary and Impact:**

On July 2, 2024, at 07:34 UTC, a deployment change in production using the MicroFrontends (MFEs) migration Ingress guide caused Kong routing to break for the circleci.com host. This led to the unavailability of APIs and UI. The issue was resolved by reverting the change in web-ui-v1, restoring Kong routing.

**Customer Impact Analysis:**

The incident had a major impact as customers were unable to access any UI or ingest pipelines. An estimated 16.5k pipelines were affected based on traffic data.

**Background:**

The incident was related to merging MicroFrontends (MFEs) in the web-ui-consolidated monorepo. A change in Kong routing led to the breakdown of APIs and UI functionality.

**What Happened:**

Issues arose due to how Kong handles route priorities, leading to unexpected behavior. The incident was exacerbated by a bug in the Kong router, impacting multiple routes.

**Lessons Learned and Future Steps:**

  • Conduct incident bot testing and improve incident response protocols.

  • Investigate route priorities and upgrade Kong for better routing.

  • Enhance internal documentation on route priorities to prevent similar incidents.

**Timeline:**

  • 07:35:46 UTC: Customer Impact Start

  • 07:39:53 UTC: Initial investigation initiated

  • 08:05:45 UTC: Helm Rollback applied, issue resolved

  • 08:06:00 UTC: Customer Impact End

  • 08:38:00 UTC: Incident End

We are committed to improving our systems and processes to prevent such incidents in the future

Posted Jul 19, 2024 - 15:20 UTC

Resolved
We are now seeing jobs triggering as usual and the UI is now visible. Any affected pipelines may need a manual trigger on the relevant branch, or an empty commit pushed. Please reach out to our customer support engineering team if you require assistance.
Posted Jul 02, 2024 - 08:28 UTC
Monitoring
We have rolled out a fix and are seeing recovery
Posted Jul 02, 2024 - 08:10 UTC
Investigating
We have identified an issue causing errors with CircleCI UI and API, preventing all jobs from running. We are working on a fix.
Posted Jul 02, 2024 - 08:09 UTC
This incident affected: Docker Jobs, Machine Jobs, macOS Jobs, Windows Jobs, and CircleCI UI.