Table of Contents

On 18 November 2025 at 11:20 UTC, Cloudflare experienced a significant outage that disrupted access to a wide range of websites and online services. Internet users attempting to visit affected sites were greeted with HTTP 5xx error pages, leaving many unable to reach the applications and services they rely on daily. Services like Workers KV, Cloudflare Access, and Turnstile, the company’s authentication and bot management systems, were also affected, leading to widespread temporary disruptions across its network.

From the outset, it was natural to suspect that a cyberattack might be behind such a large-scale failure. After all, Cloudflare’s network safeguards some of the busiest sites and applications on the Internet, and any disruption can have immediate, far-reaching consequences. However, as the company investigated the incident, it became clear that this outage was not the result of a cyberattack or malicious activity. There was no cyberattack, no external actor, and no intentional intrusion. Instead, the outage was triggered by an internal error caused by a configuration change in one of Cloudflare’s database systems.

According to Cloudflare, a change to the permissions of a ClickHouse database cluster caused the Bot Management system’s “feature file” to output a larger number of entries than expected. This file is critical to Cloudflare’s operations because it feeds the machine learning models that help detect bots and automate traffic management. When the file grew beyond the size limit that the routing software could handle, the core proxy systems responsible for processing network traffic failed, leading to cascading 5xx errors across multiple services.

Initially, the unusual behavior of the network where some systems appeared to recover temporarily before failing again created confusion and made the outage appear similar to a large-scale DDoS attack. Complicating matters further, Cloudflare’s status page also went down, coinciding with the network errors. This led the team to briefly consider the possibility of an external attack. However, careful analysis confirmed that the root cause was entirely internal: the database change inadvertently doubled the number of feature entries, and as these oversized files propagated through the network, the software could not process them.

The company acted quickly to mitigate the outage. By stopping the generation of new feature files, rolling back to a known good version, and restarting the affected core proxy systems, Cloudflare gradually restored service. By 14:30 UTC, the main impact had been addressed, and over the next few hours, all downstream systems were fully recovered, with normal operations resuming by 17:06 UTC.

While the outage lasted for nearly 3 hours, the ripple effects were felt across the Internet. Users experienced login failures, delays, and service disruptions, while internal teams worked intensively to stabilize the network. Cloudflare acknowledged the severity of the incident, noting that outages of this scale are unacceptable, given the company’s critical role in the global Internet ecosystem. The company has committed to hardening its systems to prevent similar failures in the future, including strengthening configuration file management, implementing fail-safes, and reviewing error handling across all core network modules.

The incident serves as a stark reminder that even world-class networks can be vulnerable to internal errors and misconfigurations, and that not all large-scale outages are the result of cyberattacks. For organizations and users alike, it underscores the importance of resilience, careful change management, and continuous monitoring in complex digital environments. While cyber threats are ever-present, sometimes the most disruptive risks come from within, and Cloudflare’s outage on 18 November 2025 is a clear illustration of this reality.

Ultimately, the outage highlights both the scale and fragility of modern Internet infrastructure. It was a serious disruption, but it was not caused by hackers or malicious actors. Instead, it was a misconfiguration issue. 

 

Categorized in:

Blog,