The best kind of follow-up article isn’t one that clarifies a topic that someone has done wrong (although I like that, especially when that “someone” isn’t me); This is what provides more context to the story that was incomplete. My M5 iPad Pro review was an incomplete story. As you may recall, I was unable to test Apple’s promised 3.5× improvement for native AI processing thanks to the new neural accelerator built into the M5’s GPU. It’s not that I didn’t believe Apple’s numbers. Due to the preliminary nature of the software and my time restrictions I could not test them myself.
WellI was finally able to test local AI performance with a pre-release version of MLX optimized for the M5, and let me tell you: not only is the hype real, but the numbers I got from my extensive tests over the past two weeks are actually more than Apple claims.
This article will not be long, and I will let the charts I created for this occasion speak for themselves. Essentially, as I suggested in my iPad Pro review, the M5’s improvements to native AI performance largely apply to early processing (the prefill stage, when the LLM needs to take in the user’s prompt and load it into its context) and result in much less time to first tokenize (TTFT) numbers. Since accelerated processing with neural acceleration is better with longer signals (where you can more easily measure the difference in latency between M4 and M5), I focused on testing two different long signal sizes: 10,000 and 16,000 tokens.
In my new tests based on an updated version of one of the published MLX samples, I used Qwen3-8B-4bit to measure the performance of the local AI on my old M4 iPad Pro and the new M5 iPad Pro. As you’ll see in the chart below, the prompt that took the M4 81 seconds to process was loaded in 18 seconds by the M5 – a 4.4× improvement in TTFT. The numbers become even more impressive with longer signals: while It took the M4 118 seconds (About two minutes!) To begin preparing answers for the 16,000-token prompt, The M5 did it in 38 seconds,
But enough paragraphs about numbers! Let’s look at some pretty charts instead.
10k soon
16k prompt
As you can see, there are some improvements for token generation in M4 and M5, but at 1.5× faster generation, they are minor. The star of the show for the M5 is the speedy processing in the prefill stage: Neural Accelerators significantly reduce the time required by the M5 to process long signals and start generating answers. The fact that this is happening on a consumer-grade tablet that’s thin, light, and fan-less is even more impressive.
In practice, this has a few meanings.
If you are a developer of native AI apps for iPad, I highly recommend that you start integration with MLX and consider features that will take advantage of long signals. RAG applications for cross-document search, LLM clients with “project” features that support system-level instructions, and local AI clients that integrate with MCP servers (MCP tools notoriously fill LLM’s context window with instructions and tool descriptions) are the apps that will benefit most from the M5’s faster prompt processing, especially in longer contexts.
For users, although the iPad’s app ecosystem for native AI remains largely aspirational and behind the curve compared to macOS, there are early signs of iPad apps that will take advantage of the power of the M5. Apps like LocalAI, OfflineLM and Craft (which supports local, offline assistants on iOS and iPadOS) should theoretically be able to take advantage of the power of the M5 and provide considerable performance gains over the M4.
M5 alone doesn’t change the fact that native AI is a niche, and native AI on iPadOS is a niche of a niche Now. However, the power is there, and as the public version of MLX gains support for neural acceleration, we may begin to see a progressive buildup of AI tablet apps that can run offline – and naturally more private – LLM with the kind of performance that was previously exclusive to desktops.
With that kind of power, it would be a shame if someone didn’t take advantage of it. I hope some third-party iPad app developers do the same and I’ll be along for the ride.
<a href