
But 1,000 tokens per second is actually modest by Cerebras standards. The company has measured 2,100 tokens per second on Llama 3.1 70B and reported 3,000 tokens per second on OpenAI’s own open-weight gpt-oss-120B model, which suggests that Codex-Spark’s comparatively lower speed reflects the overhead of a larger or more complex model.
AI coding agents have had a breakout year, with tools like OpenAI’s Codex and Anthropic’s Cloud Code reaching a new level of usefulness for rapidly building prototypes, interfaces, and boilerplate code. OpenAI, Google, and Anthropic are all racing to ship more capable coding agents, and latency has become what separates the winners; A model that codes faster lets developers iterate faster.
Faced with stiff competition from Anthropic, OpenAI has been rapidly working on its codecs line, releasing GPT-5.2 in December after CEO Sam Altman issued an internal “code red” memo about competitive pressure from Google, then shipping GPT-5.3-codecs a few days ago.
Diversifying away from Nvidia
Spark’s deeper hardware story may be more consequential than its benchmark scores. The model runs on Cerebras’ Wafer Scale Engine 3, a dinner plate-sized chip that Cerebras has built its business on through at least 2022. OpenAI and Cerebras announced their partnership in January, and Codex-Spark is the first product to come out of it.
OpenAI has spent the last year systematically reducing its dependence on Nvidia. The company signed a massive multi-year deal with AMD through October 2025, inked a $38 billion cloud computing deal with Amazon in November, and is designing its own custom AI chip for eventual manufacturing by TSMC.
Meanwhile, a planned $100 billion infrastructure deal with Nvidia has so far failed to materialize, although Nvidia has since committed to a $20 billion investment. Reuters reported that OpenAI became dissatisfied with the speed of some Nvidia chips for inference tasks, which is exactly the type of workload for which OpenAI designed Codex-Spark.
Regardless of which chip is under the hood, speed matters, although it may come at the expense of accuracy. For developers who spend their days inside a code editor waiting for AI suggestions, 1,000 tokens per second may feel less like carefully running a jigsaw and more like running a rip saw. Just watch what you’re cutting.
<a href