Running local models on Macs gets faster with Ollama’s MLX support

ollama speed

Olama, a runtime system for operating large language models on a local computer, has introduced support for Apple’s open source MLX framework for machine learning. Additionally, Olama says it has improved caching performance and the model now supports Nvidia’s NVFP4 format for compression, leading to more efficient memory usage in some models.

Combined, these developments promise significantly better performance on Macs with Apple Silicon chips (M1 or later) – and the timing couldn’t be better, as local models are starting to gain steam in ways they haven’t before with external researcher and hobbyist communities.

The recent immense success of OpenClaw – which has racked up over 300,000 stars on GitHub, made headlines with experiments like Moltbuk, and become an obsession, especially in China – has many people experimenting with running the model on their own machines.

As developers become frustrated with rate limits and the high cost of top-tier subscriptions to tools like Cloud Code or ChatGPT codecs, experimentation with native coding models has heated up. (Olama also recently expanded Visual Studio Code integration.)

The new support is available in preview (in Olama 0.19) and currently only supports one model – the 35-billion-parameter version of Alibaba’s Qwen3.5. The hardware requirements are steep by the standards of ordinary users. According to Olama’s announcement, users definitely need a Mac equipped with Apple silicon, but they also need at least 32GB of RAM.



<a href

Leave a Comment