The Test
Google markets Gemma 4 as "optimized for mobile deployment." That's a strong claim. So we tested it on actual hardware:
- Device: High-end phone, 8GB RAM, 256GB storage
- App: Offline AI (Hugging Face integration)
- Model: Stock gemma4:e2b (7.2 GB) — the smallest official model
- Task: General conversation, typical Q&A
If Google's smallest model works on a high-end phone, it proves mobile readiness. If it fails here, it will fail everywhere.
The phone became uncomfortably hot within minutes. Performance degraded rapidly. After 5-10 minutes, responses slowed to unusability. This is not a "mobile-ready" experience.
Why It Failed
1. Memory Pressure
Android typically uses ~2GB for the OS. That leaves 6GB for apps. Loading a 7.2 GB model:
- Immediate memory swap: Model doesn't fit in available RAM
- Constant disk I/O: System swaps memory pages to storage
- Performance degradation: SSD/flash memory is 100x slower than RAM
- CPU thrashing: Processor wastes cycles managing memory
2. Memory Bandwidth
Even if the model fits, inference requires reading weights constantly:
- 7.2 GB at 20 tokens/sec = massive bandwidth demand
- Mobile RAM bandwidth: Much lower than desktop DDR5
- Heat generation: Memory access generates significant thermal output
- Battery drain: High bandwidth = high power consumption
3. Thermal Throttling
Phones are thermally constrained by design:
- No active cooling: Unlike laptops/desktops, phones rely on passive cooling
- Small thermal mass: Heats up quickly, cools down slowly
- Throttling kicks in early: Typically at 45-50°C to prevent damage
- Performance collapse: CPU/GPU clocks reduced by 50%+ when throttled
Large model → High memory bandwidth → Heat generation → Thermal throttling → Slower inference → Longer sessions → More heat → Deeper throttling → Unusable performance
Testing gemma4-nano
Next test: Our optimized gemma4-nano:e2b (3.1 GB) on the same phone:
| Metric | Stock gemma4:e2b | gemma4-nano:e2b |
|---|---|---|
| Model Size | 7.2 GB | 3.1 GB |
| RAM Fit | ❌ Requires swap | ✅ Fits comfortably |
| Temperature | 🔥 Hot | ✅ Warm but stable |
| Performance | ❌ Slow, degrading | ✅ Fast, consistent |
| CPU Clocks | ⚠️ Throttled | ✅ Pinned (working hard) |
| Usability | ❌ No | ✅ Yes |
Why Nano Succeeds
The nano model works because it respects mobile constraints:
- Fits in RAM: 3.1 GB + OS + apps = under 6GB total
- Lower bandwidth: 57% smaller = 57% less memory traffic
- Stays cool: Reduced memory access = less heat
- Sustained performance: No thermal throttling = consistent speed
The CPU still runs hard (clocks pinned), but critically: it doesn't overheat. The thermal envelope is manageable. This is the difference between "technically possible" and "actually usable."
The Math
Let's break down why size matters so much on mobile:
Available RAM Budget
- Total RAM: 8GB
- Android OS: ~2GB
- System services: ~500MB
- Available for apps: ~5.5GB
Stock Model (7.2 GB)
- Needs: 7.2 GB
- Available: 5.5 GB
- Deficit: 1.7 GB → Constant swapping
Nano Model (3.1 GB)
- Needs: 3.1 GB
- Available: 5.5 GB
- Headroom: 2.4 GB → Room to breathe
That 2.4 GB of headroom is the difference between thermal throttling and sustained performance.
Industry Implications
For Google
Gemma 4 is technically impressive, but the "mobile-ready" claim is misleading:
- "Mobile-ready" should mean: Fast, cool, usable on typical phones
- Google's definition: Can technically load on 8GB+ device
- The gap: Lab benchmarks miss thermal constraints
For Users
If you have a phone with 8GB RAM and try to run stock Gemma 4:
- Expect overheating
- Expect performance degradation
- Expect battery drain
- Consider nano instead
For Developers
Building mobile AI apps? Key takeaways:
- 4GB should be your target for models on 8GB phones
- 3GB is better for budget phones (4-6GB RAM)
- Test on real devices for extended sessions (30+ minutes)
- Monitor thermals not just performance metrics
What "Mobile-Ready" Actually Means
There's a massive gap between these definitions:
| Aspect | Marketing Definition | Reality Definition |
|---|---|---|
| Size | "Fits on device" | "Fits in RAM with headroom" |
| Performance | "Can run" | "Runs fast without throttling" |
| Thermals | (ignored) | "Stays cool for hours" |
| Battery | (ignored) | "Reasonable power draw" |
| User Experience | "Technically works" | "Pleasant to use daily" |
A model isn't "mobile-ready" if it overheats your phone. Period. Real mobile readiness means sustained performance at comfortable temperatures.
Try It Yourself
You can validate these findings on your own phone:
- Install Offline AI (supports Hugging Face models)
- Try stock gemma4:e2b from Google
- Try gemma4-nano:e2b from ssfdre38/gemma4-nano-gguf
- Compare: Temperature, performance consistency, battery drain
This isn't theoretical. This is reproducible science. The industry needs to update its definition of "mobile-ready" to include thermal and power constraints.