New MiniMax M2.7 proprietary AI model is 'self-evolving' and can perform 30-50% of reinforcement learning research workflow

Gemini Generated Image hszkp8hszkp8hszk
Over the past few years, Chinese AI startup Minimax has become one of the most exciting in the crowded global AI market, building a reputation for delivering frontier-level large language models (LLM) and before that high-quality AI video generation models (HELLO) with open source licenses.

Today’s release of MiniMax M2.7 – a new proprietary LLM designed to thoroughly empower AI agents and serve as the backend to third-party harnesses and tools like Cloud Code, Kilo Code, and OpenClaw – marks a new milestone: Instead of relying solely on human-led fine-tuning, MiniMax has leveraged M2.7 to build, monitor, and optimize its own reinforcement learning harnesses.

This move toward iterative self-improvement signals a shift in the industry: a future where the models we use are as much the architects of their progress as they are the products of human research. The model is classified as a reasoned text-only model that provides intelligence compared to other leading systems while maintaining significantly higher cost efficiency.

However, with M2.7 now being proprietary, it is once again a sign that Chinese AI startups – for most of the last year, the standard-bearers in the world of open source AI frontiers, making them attractive to enterprises globally due to low (or no) cost and customization – are changing strategy and pursuing a more proprietary frontier model like US leaders like OpenAI, Google, and Anthropic have been doing for years.

Minimax becomes the second Chinese startup to release a proprietary cutting-edge LLM in recent months, following z.ai with its GLM-5 Turbo, and there are rumors that Alibaba’s Kuen team is also moving toward proprietary development in the wake of the departure of senior leadership and other researchers.

Technical Achievement: Self-Development Loop

The defining feature of the MiniMax M2.7 is its role in construction. According to company documentation, earlier versions of the model were used to create research agent harnesses capable of managing data pipelines, training environments, and evaluation infrastructure.

By autonomously triggering log-reading, debugging, and metric analysis, M2.7 managed between 30 percent and 50 percent of its own development workflow.

This is not just automation of rote tasks; The model optimized its own programming performance by analyzing failure trajectories and planning code modifications over iterative loops of 100 rounds or more.

"We deliberately trained the model to become better at planning and articulating requirements with the user," Skylar Miao, head of engineering at Minimax, explained on social network X. "The next step is a more complex user simulator to take this even further."

This capability extends to complex environments through MLE Bench Lite, a series of machine learning competitions designed to test autonomous research skills.

In these tests, the M2.7 achieved a medal rate of 66.6 percent The performance levels tied to Google’s new Gemini 3.1 are close to the current state-of-the-art benchmarks set by Anthropic’s Cloud Opus 4.6.

The goal, according to Minimax, is a transition toward complete autonomy in model training and inference architecture without human involvement.

Performance Evolution: Minimax M2.7 vs M2.5

When compared to its predecessor, M2.5, released in February 2026, the M2.7 model shows significant advantages in high-risk software engineering and professional office tasks.

While M2.5 was celebrated for polyglot code mastery, M2.7 is designed for real-world engineering – for tasks that require causal reasoning within live production systems.

Key performance metrics include:

  • software engineering: M2.7 scored 56.22 percent on the SWE-Pro benchmark, matching the highs of global competitors like GPT-5.3 codecs.

  • Professional Office Delivery: In document processing, M2.7 achieved an Elo score of 1495 on GDPval-AA, which the company claims is the highest among open-source-accessible models.

  • Reduction in hallucinations: The model scores plus one on the AA-Omniscience Index, a huge jump from the M2.5’s negative 40 score.

  • Hallucination Rate: The M2.7 achieves a hallucination rate of 34 percent, which is lower than the 46 percent rate for Cloud Sonnet 4.6 and the 50 percent rate for Gemini 3.1 Pro Preview.

  • System Understanding: On Terminal Bench 2, the model scored 57.0 percent, demonstrating a deep understanding of complex operational logic rather than simple code generation.

  • Skill Follow up: On the MM Claw assessment, which tests 40 complex skills worth more than 2,000 tokens each, the M2.7 maintained a 97 percent adherence rate, a substantial improvement over the M2.5 baseline.

  • Intelligence Parity: The model’s reasoning capabilities are considered equivalent to GLM-5, yet it uses 20 percent fewer output tokens to achieve the same results.

The evolution of the model is evidenced by its score of 50 on the Artificial Analysis Intelligence Index, which represents an 8-point improvement over its predecessor in just one month, and ranks 8th globally in terms of its overall intelligence in benchmarking tasks across various domains.

Not all independent, third-party benchmarks show improvement for M2.7 compared to M2.5: on Bridgebench, a set of tasks designed by agent AI coding startup Bridgemind to test model performance "vibe coding," or converting natural language into working code, M2.5 took 12th place while M2.7 took 19th place.

Access, Pricing and Integration

MiniMax M2.7 is a proprietary model available through the MiniMax API and MiniMax Agent creation platform. While the main model waits for M2.7 are closed, the company continues to contribute to the ecosystem through the open-source interactive project OpenRoom.

Thanks to direct API integration and through third-party provider OpenRouter, MiniMax M2.7 maintains a cost-leading price point of $0.30 per 1 million input tokens and $1.20 per 1 million output tokens, which is unchanged from M2.5’s pricing.

To support different usage scales and modalities, Minimax offers a structured token scheme with different subscription tiers. These plans allow users to access models of text, speech, video, image and music under a single unified quota.

To further drive adoption, Minimax has launched an Invite and Earn referral program, which offers a 10 percent discount to new invitees and a 10 percent discount voucher to those who refer.

Monthly Standard Token Plan Pricing: The standard monthly tiers are designed for everyone from entry-level developers to heavy regular users.

  • Starter: $10 per month for 1,500 requests per 5 hours.

  • Plus: $20 per month for 4,500 requests per 5 hours.

  • Maximum: $50 per month for 15,000 requests per 5 hours.

Monthly High-Speed ​​Token Plan Pricing: For production-level workloads requiring the M2.7-highspeed variant, the following tiers are available:

  • Plus-HighSpeed: $40 per month for 4,500 requests per 5 hours.

  • Max-HighSpeed: $₹80 per month for 15,000 requests per 5 hours.

  • Ultra High Speed: $150 per month for 30,000 requests per 5 hours.

Annual Token Plan Pricing: Annual subscriptions offer significant discounts for a long-term commitment:

  • Standard Starter: $100 per year (saves $20).

  • Standard Plus: $200 per year (saves $40).

  • Standard Max: $500 per year ($100 saved).

  • Hi-Speed ​​Plus: $400 per year ($80 saved).

  • Hi-Speed ​​Max: $800 per year ($160 saved).

  • High-Speed ​​Ultra: $1,500 per year ($300 saved).

A request in these plans is approximately equivalent to a call on the MiniMax M2.7, although other models in the suite, such as video or high-definition speech, consume requests at a higher rate.

official device integration

To ensure seamless adoption, MiniMax has provided official documentation to integrate M2.7 into over 11 major developer tools and agent harnesses.

This includes widely used platforms such as Cloud Code, Cursor, Try, and Jade. Other officially supported tools include OpenCode, Kilo Code, Kline, Roo Code, Droid, Grok CLI, and Codex CLI.

Additionally, the model supports the Model Context protocol, which allows it to natively use tools such as web search and understand images for multimodal reasoning. Developers using the Anthropic SDK can easily integrate M2.7 by modifying ANTHROPIC_BASE_URL to point to the Minimax endpoint.

When using MiniMax as a provider in tools like OpenClaw, image understanding capabilities are automatically configured through the model’s VLM API endpoint, requiring no additional setup from the user.

With its deep bench of integrations and its pioneering approach to iterative self-development, MiniMax M2.7 represents a significant step towards an AI-native future where models are as involved in their progress as the humans guiding them.

Strategic implications for enterprise decision makers

Technology decision makers should interpret the M2.7 release as evidence that agentic AI has moved from theoretical prototypes to production-ready utility.

The model’s ability to reduce recovery time to live production incidents to under three minutes by autonomously correlating monitoring metrics with code repositories suggests a paradigm shift for SRE and DevOps teams.

Enterprises currently facing pressure to adopt AI-driven efficiencies must decide whether they are satisfied with AI as a sophisticated assistant or whether they are ready to integrate native agent teams capable of end-to-end complete project delivery.

From a financial perspective, M2.7 represents a significant breakthrough in cost efficiency for high-level logic. Analysis indicates that the cost to operate the M2.7 is less than one third that of the GLM-5 at equivalent intelligence levels.

For example, running the standard intelligence index costs $176 on M2.7, compared to $547 for GLM-5 and $371 for KM K2.5. This aggressive pricing strategy places the M2.7 on the Pareto frontier of the intelligence versus cost chart, offering enterprise-level logic at a fraction of the market rate.

The current market is filled with high-performance models, many of which still have a slight edge in general logic scores. But M2.7’s distinctive optimization for Office suite fidelity in Excel, PPT and Word and its high performance in the GDPVal-AA benchmark make it a primary candidate for organizations focused on professional document workflows and financial modeling.

Decision makers must weigh the benefits of a general-purpose Frontier model against a specialized engine like the M2.7, which is built to interact with complex internal scaffolding and toolsets.

Finally, the fact that it is offered by a Chinese company (headquartered in Shanghai) and is subject to the laws of that country in addition to the user’s country, and is not yet available for offline or local use, may make it a tough sell for enterprises operating in the US and the West – especially those in highly-regulated or government-facing industries.

Nonetheless, the shift toward self-developed models suggests that the ROI of AI investments will increasingly be linked to the recurring benefits of the system.

Organizations that adopt models capable of harnessing their own refinements may find themselves on a faster iteration curve than those relying on static, human-only refinements. With the aggressive integration of Minimax into the modern developer stack, the barrier to testing these autonomous workflows has dropped significantly, increasing the pressure on competitors to deliver the same native agent capabilities.



<a href

Leave a Comment