New AI framework autonomously optimizes training data, architectures and algorithms — outperforming human baselines

automated ai r and d
AI R&D runs on a cycle of hypothesis, experimentation and analysis – each step requiring substantial manual engineering effort. A new framework from researchers at SII-GAIR aims to close that barrier by automating the full optimization loop for training data, model architectures, and learning algorithms.

A new framework was called for ASI-DevelopedDeveloped by researchers at the Generative Artificial Intelligence Research Lab (SII-GAIR), it aims to solve this hurdle. Designed as an agentic system for AI-for-AI research, it uses continuous "Learn-Design-Experiment-Analyze" Cycles to automate optimization of the basic AI stack.

In experiments, this self-improving loop autonomously discovered new designs that outperformed state-of-the-art human baselines. The system designed new language model architectures, improved pretraining data pipelines to increase benchmark scores by more than 18 points, and designed highly efficient reinforcement learning algorithms.

For enterprise teams running repeated optimization cycles on their AI systems, the framework offers a path to reduce manual engineering overhead while matching or exceeding the performance of human-designed baselines.

Data and design constraints

Engineering teams can only explore a small portion of the vast potential design space for AI models at any time. Executing experimental workflows requires expensive manual effort and frequent human intervention. And the insights gained from these costly cycles are often masked as personal intuition or experience, making it difficult to systematically preserve that knowledge and transfer it to future projects or different teams. These barriers fundamentally limit the pace and scale of AI innovation.

AI has made incredible strides in everything from specialized devices to scientific discoveries alphafold Solving different biological problems for agent systems that answer fundamental scientific questions. However, current frameworks still struggle with open-ended AI innovation and are mostly limited to narrow customization within specific constraints.

Advancing core AI capabilities is far more complex. This requires modifying large interdependent codebases, running computation-heavy experiments that consume tens to hundreds of GPU hours, and analyzing the multi-dimensional response from the training dynamics.

“The existing framework has not yet demonstrated that AI can work effectively in this governance in an integrated manner, nor can it generate meaningful progress across the three fundamental pillars of AI development,” the researchers wrote.

How ASI-EVOLVE learns to do research

To overcome the limitations of manual R&D, ASI-EVOLVE works on a continuous loop between prior knowledge, hypothesis generation, experimentation and refinement. The system learns relevant knowledge and historical experience from existing databases, designs a candidate program representing its next hypothesis, runs experiments to obtain evaluation signals, and analyzes the results into reusable, human-readable texts that it feeds back into its knowledge base.

There are two key components that drive ASI-EVOLVE. The “cognition base” serves as the fundamental domain expertise of the system. To speed up the search process, the system is pre-loaded with human knowledge, task-relevant heuristics, and known pitfalls extracted from existing literature. This leads exploration in promising directions from the very first iteration.

The second component is the “analyzer”, which deals with complex, multidimensional feedback from experiments. It processes raw training logs, benchmark results and efficiency traces, distilling them into compact, actionable insights and causal analysis.

Several other supplemental modules bring the framework together. A “researcher” agent reviews prior knowledge from the cognition base and previous experimental results to generate new hypotheses, either proposing localized code modifications or writing new programs.

The “Engineer” component runs the actual experiment. Because AI training tests are incredibly expensive, Engineer is equipped with efficiency measures like wall-clock limits and early rejection accelerated tests to filter out flawed candidate programs before they consume excessive GPU hours.

Finally, the “database” serves as the system’s persistent memory, storing code, research inspiration, raw results, and the analyst’s final report for each iteration, ensuring that insights are organically mixed over time.

By integrating these components, ASI-EVOLVE ensures that an AI agent systematically learns from complex, real-world experimental feedback without the need for continuous human intervention.

Whereas previous frameworks are designed to develop candidate solutions, “ASI-Evolve develops cognition itself,” the researchers write. “Accumulated experience and distilled insights are continuously stored and retrieved to inform future exploration, ensuring that the system grows not only in the quality of its solutions but also in its ability to think about where to search next.”

ASI-Developed in action

In their experiments, the researchers demonstrated that ASI-EVOLVE can successfully improve data curation, model architecture, and learning algorithms to build better AI systems.

For real-world enterprise applications, high-quality data is a persistent hurdle. When tasked with designing category-specific cleaning strategies for large-scale pretraining corpora, ASI-EVOLVE inspected data samples and diagnosed quality issues such as HTML artifacts and formatting inconsistencies. The system autonomously generated custom curation rules, which showed that systematic cleaning combined with domain-aware preservation rules is far more effective than aggressive filtering.

In benchmark tests, a 3B-parameter model trained on AI-curated data showed an increase of about 4 points in average score compared to a model trained on raw data. The gains were greatest in knowledge-intensive tasks, with performance increasing by more than 18 points on Massive Multitask Language Understanding (MMLU), an LLM benchmark that covers tasks in STEM, humanities, and social sciences.

Beyond data, the system proved to be highly capable of neural architecture design. In 1,773 autonomous exploration rounds, it generated 105 new linear attention architectures that outperformed the highly efficient human-designed baseline DeltaNet. To achieve these results, ASI-EVOLVE developed a multi-scale routing mechanism that dynamically adjusts the model’s computational budget based on the specific content of the input.

Finally, in reinforcement learning algorithm design, ASI-EVOLVE discovered novel optimization mechanisms. It designed algorithms that outperform competing GRPO baselines on complex mathematical logic benchmarks such as AMC32 and AIME24. A successful version was invented "budget-constrained dynamic radius" Which keeps model updates within a set budget, effectively stabilizing training on noisy data.

What does this mean for enterprise AI

Enterprise AI workflows constantly require adaptations to existing systems, from fine-tuning open-source models on proprietary data to making small changes to architectures and algorithms. Typically, the computational resources and engineering hours required to accomplish such efforts are enormous and beyond the capabilities of most organizations. As a result, many are left to run non-optimized versions of standard AI models.

The research team says the framework is designed to allow enterprises to integrate proprietary domain knowledge into the cognition repository and allow autonomous loops to iterate on internal AI systems.

The research team has ASI-EVOLVE code open-sourcedProviding infrastructure for developers and product creators.



<a href

Leave a Comment