
When Liquid AI, a startup founded by MIT computer scientists in 2023, introduced its Liquid Foundation Model Series 2 (LFM2) in July 2025, the pitch was straightforward: provide the fastest on-device foundation model on the market using new technology. "liquid" The architecture, with training and inference efficiency that made small models a serious alternative to cloud-only large language models (LLMs) such as OpenAI’s GPT series and Google’s Gemini.
The initial release shipped dense checkpoints at 350M, 700M and 1.2B parameters, a hybrid architecture heavy on gated short convolutions and benchmark numbers that put LFM2 ahead of similarly sized rivals like Qwen3, Llama 3.2 and Gemma 3 on both quality and CPU throughput. The message to enterprises was clear: Real-time, privacy-preserving AI on phones, laptops, and vehicles no longer needs to sacrifice capacity for latency.
In the months since that launch, Liquid has expanded LFM2 into a broader product line – adding task-and-domain-specific variants, a smaller video ingestion and analytics model, and an edge-centric deployment stack called LEAP – and deployed the model as a control layer for on-device and on-premises agentive systems.
Now, with the publication of a detailed, 51-page LFM2 technical report on arXiv, the company is going one step further: making public the architecture discovery process, training data mixing, distillation objectives, course strategy, and post-training pipeline behind those models.
And unlike earlier open models, LFM2 is built around a repeatable recipe: a hardware-in-the-loop search process, a training curriculum that compensates the small parameter budget, and a post-training pipeline for instruction adherence and tool use.
Rather than simply offering weights and an API, Liquid is effectively publishing a detailed blueprint that other organizations can use as a reference to train their own smaller, efficient models tailored to the constraints of their own hardware and deployment.
A model family designed around real constraints, not around GPU labs
The technical report starts from a premise that enterprises are deeply familiar with: real AI systems reach limits long before benchmarks. Latency budgets, peak memory ceilings and thermal throttling define what can actually run in production – especially on laptops, tablets, commodity servers and mobile devices.
To address this, Liquid AI performed architecture discovery directly on target hardware, including Snapdragon mobile SoCs and Ryzen laptop CPUs. The result is a consistent result across all sizes: dominance of a minimal hybrid architecture. Gated Small Convolution Block and a small number Grouped-Query Attention (GQA) Layers. This design was repeatedly chosen over more exotic linear-attention and SSM hybrids because it provided a better quality-latency-memory Pareto profile under real device conditions.
This matters for enterprise teams in three ways:
- Predictability. The architecture is simple, parameter-efficient, and stable across model sizes ranging from 350M to 2.6B.
-
Operational portability. DENS and MOE variants share the same structural backbone, simplifying deployment in mixed hardware fleets.
-
On-device compatibility. Prefill and decode throughput on CPU is in many cases approximately 2× higher than comparable open models, reducing the need to offload routine tasks to cloud inference endpoints.
Rather than optimizing for academic innovation, the report reads as a systematic effort to design enterprise models Actually the ship.
This is notable and more practical for enterprises in the region where many open models tacitly assume access to multi-H100 clusters during inference.
A training pipeline tailored for enterprise-relevant behavior
LFM2 adopts a training approach that compensates for the small scale of its models by structure rather than brute force. Key elements include:
- 10-12T token pre-training and an extra 32K-Reference Mid-Training StageWhich expands the useful reference window of the model without exploding the computation cost.
-
A Decoupled TOP-K Knowledge Distillation Objectives This removes the instability of standard KL distillation when teachers provide only partial logs.
-
A Three step sequence after training-SFT, length-normalized preference alignment, and model merging—designed to generate more reliable instruction following and tool-use behavior.
For enterprise AI developers, the importance is that LFM2 models behave less like “little LLMs” and more like practical agents capable of following structured formats, adhering to JSON schema, and managing multi-turn chat flows. Many open models of similar size fail not because of lack of reasoning ability, but because of weak adherence to instruction templates. The LFM2 post-training recipe directly targets these rough edges.
In other words: Liquid AI optimized for smaller models operational reliabilityNot just the scoreboard.
Multimodality is designed for device constraints, not a lab demo
The LFM2-VL and LFM2-Audio variants reflect another change: multimodality built around nominal efficiency,
Instead of embedding a huge vision transformer directly into the LLM, the LFM2-VL adds a SIGLIP2 encoder via a connector that aggressively reduces the vision token count via PixelUnshuffle. High-resolution inputs automatically trigger dynamic tiling, keeping token budgets under control even on mobile hardware. LFM2-Audio uses a bifurcated audio path – one for embedding, one for generation – supporting real-time transcription or speech-to-speech on modest CPUs.
For enterprise platform architects, this design points to a practical future where:
- Document understanding occurs directly at endpoints such as field devices;
-
Audio transcription and speech agents run locally for privacy compliance;
-
Multimodal agents operate within a fixed latency envelope without streaming data to and from the device.
The through-line is the same: multimodal capability without the need for a GPU farm.
Recovery models are built for agent systems, not legacy search
LFM2-ColBERT extends late-interaction retrieval into a footprint small enough for enterprise deployments, without the overhead of specialized vector DB accelerators that require multilingual RAGs.
This is especially meaningful as organizations begin to consolidate fleets of agents. Fast local retrieval – Running on the same hardware as Reason models reduces latency and wins governance: documents never leave the device range.
Taken together, the VL, Audio and Colbert variants show the LFM2 to be a modular system, without a single model drop.
Emerging Blueprint for Hybrid Enterprise AI Architecture
In all variants, the LFM2 report clearly shows what tomorrow’s enterprise AI stack will look like: Hybrid local-cloud orchestrationWhere smaller, faster models running on devices handle time-critical perception, formatting, tool invocation and decision tasks, larger models in the cloud provide the heavy reasoning when needed.
Several trends are found here:
- cost control. Running regular estimates locally can avoid unexpected cloud billing.
-
Latency determinism. TTFT and decode consistency matters in agent workflows; Eliminates on-device network jitters.
-
Governance and Compliance. Local execution simplifies PII handling, data residency and auditability.
-
resilience. Agent systems degrade spectacularly if the cloud path becomes unavailable.
Enterprises adopting these architectures will likely consider smaller on-device models as the “control plane” of agentic workflows, with larger cloud models serving as on-demand accelerators.
LFM2 is one of the clearest open-source foundations for that control layer to date.
Strategic Conclusion: On-device AI is now a design choice, not a compromise
For years, organizations building AI features have acknowledged that “real AI” requires cloud inference. LFM2 challenges that notion. The models perform competitively in logic, instruction following, multilingual tasks, and RAG – as well as achieving substantial latency advantages over other open small-model families.
For CIOs and CTOs finalizing the 2026 roadmap, the implication is direct: Small, open, on-device models are now robust enough to carry meaningful pieces of production workloads.
LFM2 will not replace the frontier cloud model for frontier-scale reasoning. But it provides something that enterprises definitely need more: a reproducible, open, and operationally viable base. Agentic systems that should run anywhereFrom phones to industrial endpoints to air-gapped secure facilities.
In the broader landscape of enterprise AI, LFM2 is less a research milestone and more a sign of architectural convergence. The future isn’t a cloud or an edge – it’s both working together. And releases like LFM2 provide the building blocks for organizations that are ready to build that hybrid future, not accidentally but intentionally.
<a href