Andrej Karpathy's New Open Source 'autoresearch' Lets You Run Hundreds Of AI Experiments A Night — With Revolutionary Implications

Over the weekend, Andrzej Karpathy—the influential former Tesla AI chief and co-founder and former member of OpenAI who coined the term "vibe coding"- Posted on X about their new open source project, AutoResearch.

It wasn’t a finished model or a huge corporate product: it was, by their own admission, a simple, 630-line script made available on Github under an approved, enterprise-friendly MIT license. But the ambition was much bigger: to automate the scientific method with AI agents while we humans sleep.

"The goal is to engineer your agents to make the fastest research progress indefinitely and without any involvement from you," He said on X.

The system functions as an autonomous optimization loop. An AI agent is given a training script and a fixed computation budget (typically 5 minutes on a GPU).

It reads its own source code, creates a hypothesis for improvement (such as changing the learning rate or depth of the architecture), modifies the code, runs experiments, and evaluates the results.

If validation loss—measured in bits per byte (val_bpb)—improves, it keeps changing; If not, it comes back and tries again. In In one overnight run, Carpathy’s agent completed 126 experimentsDriving loss decreased from 0.9979 to 0.9697.

Today, Karpathy revealed that after leaving the agent to tune in "depth=12" In a two-day model, it successfully processed about 700 autonomous changes.

The agent received approximately 20 additive improvements that were fully transferable to larger models. These changes were dropped from stack "GPT-2 time" The metric on the leaderboard went from 2.02 hours to 1.80 hours – an 11% efficiency gain on a project Karpathy believed he already had well-tuned.

"It’s strange to see the agent doing this entire workflow from start to finish all by himself…" Karpathy commented, noting that the agent caught mistakes in attention enhancement and regularization that he had missed manually over two decades of work.

This is more than just a productivity hack; This is a fundamental change in the way intelligence is refined. by automating "scientific method" As for code, Carpathy has turned machine learning into an evolutionary process that runs at the speed of silicon rather than the speed of human thought.

And what’s more, it showed the broader AI and machine learning community at X that this type of process can be applied beyond computer science, to areas like marketing, health, and, basically, anything that requires research.

Autoresearch is widespread

The response was swift and viral, with Karpathy’s post receiving more than 8.6 million views in the intervening two days, as builders and researchers tried to amplify it. "karpathi loch".

Varun Mathur, CEO of Hyperspace AI, an AI tool aggregator platform, took the single-agent loop and distributed it across peer-to-peer networks. Each node running the Hyperspace Agent became an autonomous researcher.

On the night of March 8–9, 35 autonomous agents on the Hyperspace Network ran 333 experiments, completely unsupervised. The results were a masterclass in contingency strategy:

Hardware diversity as a characteristic:Mathur said that while H100 GPU is used "brute force" To find the aggressive learning rate, the CPU-only agents on the laptop were forced to get smarter. in "inferior person or group" Agents focused on initialization strategies (such as Camming and Xavier Init) and normalization options because they could not rely on raw throughput.
gossip-based search: Using the GossipSub protocol, agents share their winnings in real time. When one agent discovered that camming initialization reduced losses by 21%, the idea spread across the network like a digital virus. Within hours, 23 other agents had incorporated the discovery into their own hypotheses.
History compression: In just 17 hours, these agents independently rediscovered ML milestones – such as RMSNorm and bounded embeddings – that took human researchers at labs like Google Brain and OpenAI nearly eight years to formalize.

Run 36,500 marketing experiments each year instead of 30

While ML purists focused on loss curves, the business world saw a different kind of revolution. Eric Siu, founder of advertising agency Single Grain, implemented AutoResearch "experiment loop" Of marketing.

"Most marketing teams run ~30 experiments per year," Siu wrote on X. "The next generation will run 36,500+. easily." He continued:

"They will run the experiment while sleeping. Current marketing teams run 20-30 experiments a year. If they’re ‘good’ maybe 52. New landing page. New ad creative. Maybe a subject line test. it has been considered "Data-driven marketing."
But the next generation of marketing systems will run 36,500+ experiments per year."

Siu’s framework transforms a training script into a marketing asset—a landing page, an ad creative, or a cold email. Agent modifies a variable (subject line or CTA), deploys it, measures it "positive response rate," And keeps or discards.

Siu argues that this creates a "proprietary map" What resonates with a specific audience—a chasm created not by code, but by the history of experimentation. "The companies that win won’t just have better marketers," He has written, "They will have faster experiment loops".

Community discussion and ‘spoiling’ the validation set

Despite the enthusiasm, GitHub discussions revealed a community grappling with the implications of such rapid, automated progress.

hyper-optimizing trap:Researcher Alexistual expressed a poignant concern: "Aren’t you worried that launching so many experiments will eventually ‘spoil’ the validation set?". The fear is that with enough agents, the parameters will be optimized for the specific quirks of the test data rather than general intelligence.

meaning of profit: User samionb questioned whether the drop from 0.9979 to 0.9697 was really worth noting. Carpathy’s response was typically straightforward: "All we are doing is optimizing performance per compute… these are real and substantial benefits"

human elementOn: "The model got better by being simpler".

This insight – that less is often more – was achieved without a single human intervention.

The future: curiosity as a hindrance

The release of AutoResearch suggests the future of research in areas where, thanks to simple AI instruction mechanisms, the role of humans is replaced "experimenter" To "Experimental designer."

As tools like Darkmatter, Optimization Arena, and Nanoclaw have emerged to support this swarm, the barrier to AI progress is no longer "meat computer" (Carpathy’s description of the human brain) Ability to code – This is our ability to define the constraints of discovery.

Andrej Karpathy has once again changed the atmosphere. We are no longer just coding models; We are seeding ecosystems that learn while we sleep.

<a href

Andrej Karpathy's new open source 'autoresearch' lets you run hundreds of AI experiments a night — with revolutionary implications

Autoresearch is widespread

Run 36,500 marketing experiments each year instead of 30

Community discussion and ‘spoiling’ the validation set

The future: curiosity as a hindrance

Like this:

Related

Leave a Comment Cancel reply

Autoresearch is widespread

Run 36,500 marketing experiments each year instead of 30

Community discussion and ‘spoiling’ the validation set

The future: curiosity as a hindrance

Share this:

Like this:

Related

Leave a Comment Cancel reply