Gemma 4 brings local AI to phones and consumer devices
Google has introduced Gemma 4, an open-weight AI model built to run locally on smartphones and other consumer hardware. The model family is based on Gemini 3 and comes in four versions tuned for different needs, giving developers and users room to pick the option that fits their workload best.
What stands out most is the local-first design. Some Gemma 4 variants are made to run fully offline, which means no Internet connection is required during use. That opens the door for on-device AI experiences across phones and small edge systems without depending on cloud access.
Gemma 4 model sizes and what each version is built for
Large Gemma 4 models target advanced local workloads
The two biggest Gemma 4 models are the 26B Mixture of Experts model and the 31B Dense model. Running these models unquantized in bfloat16 format requires an 80GB Nvidia H100 GPU.
Google says these larger versions bring frontier-level intelligence to personal computers for students, researchers, and developers. They are positioned for advanced reasoning tasks across IDEs, coding assistants, and agentic workflows.
The 26B model emphasizes efficiency during inference
The 26B model uses a Mixture of Experts design and activates only 3.8 billion of its 26 billion parameters during inference. That smaller active footprint is intended to improve tokens-per-second performance compared with similar models while also cutting latency.
This makes the 26B version notable for users who want stronger performance without activating the full parameter count on every pass.
The 31B model is built for maximum raw quality
The 31B Dense model takes a different path. Instead of focusing on selective activation, it is designed to maximize raw quality. Google also says developers can fine-tune it for specific use cases, making it more flexible for specialized deployments where output quality matters most.
Gemma 4 models that can run offline on smartphones
Effective 2B and Effective 4B are the most relevant for everyday devices
For most end-users, the two most practical options are Effective 2B and Effective 4B. These smaller Gemma 4 variants can run entirely offline and require minimal memory during inference.
They use only 2 billion and 4 billion parameters, respectively. According to Google, reducing the number of active parameters is what allows these models to operate on mobile and IoT hardware.
Supported device types include phones and compact edge systems
Google says these smaller models can run on:
- Smartphones
- Raspberry Pi
- Jetson Nano
That matters because it shifts AI usage closer to the device itself. Instead of treating AI as something that always needs remote infrastructure, Gemma 4 makes room for more local deployment on familiar hardware people already use.
Gemma 4 performance claims and leaderboard position
Google says Gemma 4 is not only significantly faster than Gemma 3, but also the most capable AI model family it has designed for local hardware.
Independent testing appears to support at least part of that claim. On the Arena AI leaderboard for open models, the 31B model holds the #3 position, behind GLM-5 and Kimi 2.5. The 26B model ranks #6.
Those placements help reinforce the idea that Gemma 4 is not just lighter or more portable. It is also being positioned as a serious local AI option for stronger reasoning and development-focused tasks.
Apache 2.0 license makes Gemma 4 more usable for developers
Gemma 4 uses Apache 2.0 instead of a restrictive custom license
Gemma 4 has been released under the Apache 2.0 license. That gives developers permission to integrate the models into apps and services without usage restrictions.
This is a major shift from Gemma 3, which used a custom Google license with stricter policies and multiple limitations. Those restrictions made Gemma 3 less appealing for some developers, especially those building commercial or widely distributed tools.
Commercial use, modification, and redistribution are allowed
Under Apache 2.0, developers can use Gemma 4 for commercial purposes, modify it, redistribute it, and deploy it, with attribution as the main requirement.
That licensing approach makes Gemma 4 more practical for real product development. For teams that want fewer legal or operational barriers, this is a meaningful improvement.
Why Gemma 4 is open-weight, not fully open-source
Despite the Apache 2.0 license, Gemma 4 is described as open-weight rather than fully open-source.
According to the Open Source Initiative, an AI model can only be considered open-source when the full training dataset, scripts, infrastructure code, and detailed methodology are also released. In this case, Google is releasing the model parameters, but not the complete reproducible training pipeline.
That means others cannot recreate the model from scratch based on what has been published. Still, for many developers, that distinction may not be a deal-breaker. The license continues to allow broad use, modification, redistribution, and deployment, which covers the practical needs of many real-world projects.
Gemma 4’s local AI focus from mobile to 30B-class systems
Gemma 4 spans a wide range of deployment targets, from lightweight offline models for phones and IoT devices to larger 30B-class systems aimed at advanced reasoning on personal computers.
That range is the real story here. The smaller models focus on low memory use and offline execution. The larger ones push for reasoning quality, coding support, and agentic workflows. Together, they give Google a model family that scales from mobile hardware to high-end GPU environments while staying available under a much more developer-friendly license.

