How OpenAI is scaling the PostgreSQL database to 800 million users

chatGPT PostgreSQL
While vector databases still have many legitimate use cases, organizations including OpenAI rely on PostgreSQL to get the job done.

In a blog post on Thursday, OpenAI revealed how it is using the open-source PostgreSQL database.

OpenAI runs ChatGPT and its API platform for 800 million users on a single-primary PostgreSQL instance – not a distributed database, not a shared cluster. An Azure PostgreSQL Flexible Server handles all writes. About 50 read replicas spread across multiple regions handle the reads. The system processes millions of queries per second while maintaining low double-digit millisecond P99 latency and five-by-nine availability.

The setup challenges conventional scaling wisdom and provides enterprise architects with insight into what really works at scale.

TeaTheir lesson here is not to copy OpenAI’s stack. It is that architectural decisions should be driven by workload patterns and operational constraints – not by mass panic or fashionable infrastructure choices. OpenAI’s PostgreSQL setup shows how far proven systems can spread when teams deliberately optimize rather than re-architect prematurely.

"For years, PostgreSQL has been one of the most important, under-the-hood data systems powering core products like ChatGPS and OpenAI’s APIs," OpenAI engineer Bohan Zhang wrote in a technical disclosure. "Over the past year, our PostgreSQL load has grown more than 10x, and continues to grow rapidly."

The company achieved this scale through targeted optimizations, including connection pooling that reduces connection times from 50 milliseconds to 5 milliseconds and cache locking to prevent ‘Thundering Heard’ problems where cache misses trigger database overloads.

Why does PostgreSQL matter for enterprises?

PostgreSQL handles operational data for ChatGPT and OpenAI’s API platforms. The workload is highly read-oriented, making PostgreSQL a good fit. However, PostgreSQL’s multiversion concurrency control (MVCC) creates challenges under heavy write loads.

When updating data, PostgreSQL copies entire rows to create new versions, causing write amplification and forcing queries to scan through multiple versions to find the current data.

Instead of fighting this limitation, OpenAI built its strategy around it. At the scale of OpenAI, these tradeoffs are not theoretical – they determine which workloads will remain on PostgreSQL and which should move elsewhere.

How OpenAI is optimizing PostgreSQL

At large scale, conventional database wisdom points toward one of two paths: shard PostgreSQL across multiple primary instances so writes can be distributed, or migrate to a distributed SQL database like CockroachDB or YugabyteDB that is designed from the start to handle large scale. Most organizations would have taken one of these paths years ago, long before reaching 800 million users.

Sharing or moving a distributed SQL database eliminates the single-author constraint. A distributed SQL database handles this coordination automatically, but both approaches introduce significant complexity: application code must route queries to the correct shard, making distributed transactions harder to manage and significantly increasing operational overhead.

Instead of sharing PostgreSQL, OpenAI instituted a hybrid strategy: no new tables in PostgreSQL. New workloads default to sharded systems like Azure Cosmos DB. Existing write-heavy workloads that can be partitioned horizontally are moved out. Everything else remains in PostgreSQL with aggressive optimization.

This approach provides enterprises a practical alternative to wholesale re-architecture. Instead of spending years rewriting hundreds of endpoints, teams can identify specific bottlenecks and move only those workloads to purpose-built systems.

Why does it matter?

OpenAI’s experience in scaling PostgreSQL reveals several practices that enterprises can adopt regardless of their scale.

Build operational security at multiple layers. OpenAI’s approach adds cache locking to prevent "roaring herd" issues, connection pooling (which reduced their connection time from 50 ms to 5 ms), and rate limiting at the application, proxy, and query levels. Workload isolation routes low-priority and high-priority traffic to separate instances, ensuring that a poorly optimized new feature cannot degrade core services.

Review and monitor ORM-generated SQL in production. Object-relational mapping (ORM) frameworks such as Django, SQLAlchemy, and Hibernet automatically generate database queries from application code, which is convenient for developers. However, OpenAI found an ORM-generated query involving 12 tables caused several high-severity incidents as traffic increased. The convenience of having the framework generate SQL creates hidden scaling risks that are only exposed under production load. Make reviewing these questions a standard practice.

Enforce strict operational discipline. OpenAI allows only mild schema changes – anything that triggers a full table rewrite is prohibited. Schema changes have a 5-second timeout. Long-running queries are automatically terminated to prevent blocking database maintenance operations. When backfilling data, they enforce rate limits so aggressively that the operation can take more than a week.

Read-heavy workloads with burst writes can run longer than usual on single-primary PostgreSQL. The shard decision should depend on the workload pattern rather than the number of users.

This approach is particularly relevant for AI applications, which often have heavy read-oriented workloads with unpredictable traffic spikes. These characteristics align with the pattern of where single-primary PostgreSQL scales effectively.

The lesson is straightforward: Identify the real bottlenecks, optimize proven infrastructure where possible, and move selectively when necessary. Bulk re-architecture is not always the answer to scaling challenges.



<a href

Leave a Comment