
Amid the growing push toward AI agents, with both Anthropic and OpenAI shipping multi-agent tools this week, Anthropic is ready to show off some of its more daring AI coding experiments. But as always with claims of AI-related achievements, you’ll find some major caveats ahead.
On Thursday, Anthropic researcher Nicholas Carlini published a blog post describing how he set up 16 instances of the company’s Cloud Opus 4.6 AI models on a shared codebase with minimal supervision, and tasked them with building a C compiler from scratch.
Over two weeks and approximately 2,000 cloud code sessions at a cost of approximately $20,000 in API fees, the AI model agents reportedly produced a 100,000-line Rust-based compiler capable of creating a bootable Linux 6.9 kernel on x86, ARM, and RISC-V architectures.
Carlini, a research scientist on Anthropic’s Safeguards team who previously spent seven years at Google Brain and DeepMind, used a new feature launched with Cloud Opus 4.6 called “Agent Teams.” In practice, each cloud instance ran inside its own Docker container, cloning a shared Git repository, claiming tasks by writing lock files, then pushing the completed code back upstream. No orchestration agent directed traffic. Each example independently identified the problem that seemed most obvious to work on further and began solving it. When merge conflicts arose, AI model instances resolved them automatically.
The resulting compiler, which Anthropic has released on GitHub, can compile many major open source projects, including PostgreSQL, SQLite, Redis, FFmpeg, and QEMU. It compiled and ran, achieving a 99 percent pass rate on the GCC torture test suite, which Carlini called “the developer’s ultimate litmus test.” Apocalypse.
It’s worth noting that the C compiler is an ideal task for coding semi-autonomous AI models: the specification is decades old and well-defined, comprehensive test suites already exist, and there is a known-good reference compiler to check it against. Most real-world software projects have none of these benefits. The hard part of most development is not writing code that passes tests; It’s figuring out what the tests should be in the first place.
<a href