
Google’s Gemini AI model has improved significantly over the past year, but you can use Gemini only on Google’s terms. The company’s Gemma open-weight models offer more freedom, but the Gemma 3, which launched a year ago, is taking a little longer. Starting today, developers can start working with Gemma 4, which comes in four sizes optimized for local use. Google has also acknowledged developer frustration with AI licensing, so it is removing the custom Gemma license.
Like previous versions of its open-source models, Google has designed Gemma 4 to be usable on local machines. Of course, this could mean many things. The two larger Gemma variants, 26B Mixture of Experts and 31B Dense, are designed to run without volume in bfloat16 format on an 80 GB Nvidia H100 GPU. Granted, it’s a $20,000 AI accelerator, but it’s still native hardware. If scaled to run at lower precision, these larger models will fit on consumer GPUs.
Google also claims it has focused on reducing latency to really take advantage of Gemma’s local processing. The 26B Mixture of Experts model activates only 3.8 billion of its 26 billion parameters in inference mode, giving it much more tokens per second than similarly sized models. Meanwhile, 31b Dense is more about quality than speed, but Google hopes developers will optimize it for specific uses.
The other two Gemma 4 models, Effective 2B (E2B) and Effective 4B (E4B), are targeted for mobile devices. These options were designed to maintain low memory usage during inference, running at an effective 2 billion or 4 billion parameters. Google says the Pixel team worked closely with Qualcomm and MediaTek to optimize these models for devices like smartphones, the Raspberry Pi, and the Jetson Nano. Not only do they use less memory and battery than Gemma 3, but Google also claims “near-zero latency” this time around.
more powerful, more open
All new Gemma 4 models will reportedly leave Gemma 3 in the dust—Google claims these are the most capable models you can run on your local hardware. Google says that Gemma 31B will come third in the Arena list of top open AI models, behind GLM-5 and KM2.5. However, even the largest Gemma 4 version is a fraction of the size of those models, which theoretically makes it much cheaper to run.
<a href