Cloudflare explains Tuesday’s outage that temporarily took down ChatGPT

Cloudflare’s bot control generators are supposed to help tackle problems like crawlers scraping information to train AI. It also recently announced a system that uses generative AI to create “AI mazes”, a new mitigation approach that uses AI-generated content to slow, confuse, and waste the resources of AI crawlers and other bots that don’t respect ‘no crawl’ instructions.

However, it says that today’s problems were caused by changes to the database’s permissions system, not generative AI technology, not DNS, nor what Cloudflare initially suspected, a cyberattack or malicious activity such as a “hyper-scale DDoS attack.”

According to Prince, the machine learning model behind bot management that generates bot scores for requests traveling on its network has a frequently updated configuration file that helps ID automated requests; However, “This file contains a large number of duplicate ‘feature’ rows due to a change in our underlying ClickHouse query behavior that created this file.”

The post has more details about what happened next, but the query change caused duplicate information in its ClickHouse database. As the configuration file rapidly exceeded the preset memory limit, it removed “the main proxy system that handles traffic processing for any traffic that relies on the bot module for our clients.”

As a result, companies that used Cloudflare’s rules to block certain bots returned false positives and cut off real traffic, while Cloudflare customers who did not use the generated bot scores in their rules remained online.



Leave a Comment