
Liquid AI, founded by former MIT computer scientists, has today released its smallest AI language model, LFM2.5-230M, and enterprises would do well to consider it for their use in data extraction and local deployment on smartphones, laptops, and robotics.
It’s a 230-million-parameter foundation model explicitly designed for on-device agentic workflows, and as Liquid said in its release blog post, the small size makes it possible to run virtually "Anywhere." According to Liquid, it outperforms models more than 4X its size on selected benchmarks, notably performing better in data extraction than the 800 million parameter count Alibaba Qwen3.5-0.8B (Instruct) and the 1-billion parameter Google Gemma 3 1B.
This model targets developers and engineers building lightweight data extraction pipelines and autonomous edge systems.
Operating under a dual-use commercial license, this model remains free for individuals and companies generating less than $10 million in annual revenue, while larger corporations require a paid enterprise agreement.
This release differentiates itself from other small AI models by using the LFM2 architecture to achieve high inference speeds without the huge memory overhead typical of parameter-heavy Transformers.
While leading AI companies Anthropic, OpenAI, Google, Microsoft, Meta and others push the parameters into the hundreds of billions or trillions to achieve cutting-edge performance, a parallel race focuses entirely on edge and local deployments.
Liquid AI’s launch of the LFM2.5-230M signals a significant shift toward architectural efficiency versus brute-force scaling. By squeezing 19 trillion tokens of pre-training into a 230 million-parameter footprint, the company demonstrates that edge devices do not need massive computational power or persistent cloud connections to execute complex, multi-step agentic workflows.
How LFM2.5-230M works
The LFM2.5-230M model departs from the standard transformer architecture, relying instead on the LFM2 framework. This architecture acts as a hybrid system, interleaving gated short-range convolution with grouped-query attention to efficiently process information.
For those keeping an eye on the development of efficient architectures, Liquid’s approach shares a similar conceptual goal: effectively managing long references and sequential data on edge hardware without the quadratic memory cost of pure attention mechanisms. The model supports a wide 32K reference window, allowing it to capture substantial documents or continuous streams of robotic telemetry.
When analyzing the performance charts provided in the release, the architectural efficiency becomes clearly evident. The model maintains a memory footprint of less than 400 MB while achieving prefill and decode speeds that surpass comparable models such as the Gemma 3 1B IT and Granite 4.0-H-350M.
On the Samsung Galaxy S25 Ultra equipped with a Qualcomm Snapdragon Gen4 CPU, the model reaches a decode speed of 213 tokens per second. Even on the highly restricted Raspberry Pi 5, the model maintains a decode rate of 42 tokens per second. Additionally, internal benchmarking shows that the GPU inference stack provides lower end-to-end latency than competing smaller models at all concurrency levels.
Why does this matter to enterprises?
To understand why a 230 million-parameter model is necessary, one has to look at how enterprises currently manage data.
Organizations have traditionally relied on rigid, rule-based extract, transform, load (ETL) scripts to move and process data. However, these legacy systems are extremely fragile; A simple change in the layout of a document or a schema update can break the entire pipeline.
The industry is moving forward to solve this "AI ETL," Where machine learning infers mappings, detects schema drift, and automatically adapts to changes. In a modern lightweight data extraction pipeline, an AI model connects to unstructured sources such as PDF, email or web forms and structures the data into formats such as JSON without the need for hardcoded rules.
For enterprises, it is economically unviable to use a massive flagship model like Cloud Opus 4.6 (which costs $5.00 per million input tokens) to parse regular invoices, format addresses, or route telemetry data.
This is where models like the LFM2.5-230M become important. Explicitly designed as a light-weight extraction engine, it allows companies to automate repetitive formatting and data parsing at a fraction of the compute cost and latency, running directly on local hardware rather than relying on expensive, constant cloud API calls.
Small Model Benchmark: LFM vs 3B Class
AI industry sees renaissance in mid-2026 "Small" model, but the definition of "Small" Varies wildly.
Recently, the open-source community was stunned by Weibo’s VibeThinker-3B, a 3-billion-parameter model built on a Qwen2-style backbone, which achieved a whopping score of 94.3 on the AIME 2026 mathematics benchmark, rivaling 600-billion-parameter giants through aggressive data curation and reinforcement learning.
Similarly, Google’s Gemma 4 family – which recently surpassed 200 million downloads – pushes frontier AI to the edge, including E2B (2 billion parameters) designed specifically for mobile and IoT deployments.
In contrast, Liquid AI’s LFM2.5-230M operates in a completely different weight class. At just 230 million parameters, it is about one-tenth the size of Google’s smallest Gemma 4 model and VibeThinker-3B.
Due to its tiny footprint, the LFM2.5-230M is not designed to compete on logic-heavy workloads like advanced math, coding, or creative writing – a constraint that Liquid AI clearly acknowledges.
However, in its intended domain of data extraction and tool calling, the model punches well above its weight class.
Benchmarks released by Liquid AI show that the LFM2.5-230M scored 43.26 on the BFCLV3 tool-usage benchmark, dominating IBM’s Granite 4.0-350M (39.58) and completely outperforming larger 1-billion-parameter models like Google’s Gemma 3 1B IT (16.61).
On CaseReportBench for data extraction, it scores 22.51, underperforming Qwen3.5-0.8B (instructions).
LFM2.5-230M proves that while 3-billion-parameter models like VibeThinker are solving advanced calculus, a 230-million-parameter model is the better, highly optimized choice for executing structured tool calls and running agentive pipelines efficiently on restricted hardware.
advanced research use
Because it excels at tool calling, the LFM2.5-230M primarily serves as a skill-selection layer. Liquid AI demonstrated this capability by deploying the model on the Unity G1 humanoid robot.
The model, running entirely on-device via the robot’s onboard NVIDIA Jetson Orin compute module, successfully processes complex environmental commands.
As explained in the company’s tech blog, the model takes a free-form instruction such as, *"Hold still for 2 seconds, then walk forward at a speed of 1 meter per second for 3 meters, bend one leg forward at the knee for 5 seconds, and walk backward at a speed of 0.5 meters per second for 3 meters."* And automatically converts it into a structured multi-step plan based on pre-trained low-level skills provided by NVIDIA’s SONIC framework.
Base and post-trained models are immediately available on Hugging Face, including day-one support for llama.cpp (GGUF), MLX, vLLM, SGLang, and ONNX in the inference ecosystem.
Dual-use, custom LFM open license
Liquid AI ships the LFM2.5-230M under the LFM Open License v1.0. despite the word "open" In the title, this is not an Open Source Initiative (OSI) compliant license; It operates as a restricted, dual-use commercial structure.
For independent developers, researchers, and early-stage startups, the license functions similarly to open-source software.
Users receive a perpetual, worldwide, royalty-free license to reproduce, modify, and distribute the models, provided they retain the original copyright notice and prominently disclose any modifications.
However, the license includes a strict "commercial use limit". Any legal entity generating $10 million or more in annual revenue loses the right to use the Model commercially under this Agreement.
Larger enterprises that exceed this financial threshold must negotiate a separate, paid commercial agreement with Liquid AI to deploy models in production.
This strategy protects the company from having its intellectual property absorbed for free by major technology groups, while still seeding the model at the grassroots developer level.
<a href