Real-World Mobile Testing: Why Stock Models Fail

We tested Google's "mobile-ready" Gemma 4 on a high-end phone with 8GB RAM. It overheated and crawled to a halt. Then we tested our nano version. It stayed cool. Here's the technical breakdown.

The Test

Google markets Gemma 4 as "optimized for mobile deployment." That's a strong claim. So we tested it on actual hardware:

  • Device: High-end phone, 8GB RAM, 256GB storage
  • App: Offline AI (Hugging Face integration)
  • Model: Stock gemma4:e2b (7.2 GB) — the smallest official model
  • Task: General conversation, typical Q&A

If Google's smallest model works on a high-end phone, it proves mobile readiness. If it fails here, it will fail everywhere.

Result: Complete Failure

The phone became uncomfortably hot within minutes. Performance degraded rapidly. After 5-10 minutes, responses slowed to unusability. This is not a "mobile-ready" experience.

Why It Failed

1. Memory Pressure

Android typically uses ~2GB for the OS. That leaves 6GB for apps. Loading a 7.2 GB model:

  • Immediate memory swap: Model doesn't fit in available RAM
  • Constant disk I/O: System swaps memory pages to storage
  • Performance degradation: SSD/flash memory is 100x slower than RAM
  • CPU thrashing: Processor wastes cycles managing memory

2. Memory Bandwidth

Even if the model fits, inference requires reading weights constantly:

  • 7.2 GB at 20 tokens/sec = massive bandwidth demand
  • Mobile RAM bandwidth: Much lower than desktop DDR5
  • Heat generation: Memory access generates significant thermal output
  • Battery drain: High bandwidth = high power consumption

3. Thermal Throttling

Phones are thermally constrained by design:

  • No active cooling: Unlike laptops/desktops, phones rely on passive cooling
  • Small thermal mass: Heats up quickly, cools down slowly
  • Throttling kicks in early: Typically at 45-50°C to prevent damage
  • Performance collapse: CPU/GPU clocks reduced by 50%+ when throttled
The Death Spiral

Large model → High memory bandwidth → Heat generation → Thermal throttling → Slower inference → Longer sessions → More heat → Deeper throttling → Unusable performance

Testing gemma4-nano

Next test: Our optimized gemma4-nano:e2b (3.1 GB) on the same phone:

Metric Stock gemma4:e2b gemma4-nano:e2b
Model Size 7.2 GB 3.1 GB
RAM Fit ❌ Requires swap ✅ Fits comfortably
Temperature 🔥 Hot ✅ Warm but stable
Performance ❌ Slow, degrading ✅ Fast, consistent
CPU Clocks ⚠️ Throttled ✅ Pinned (working hard)
Usability ❌ No ✅ Yes

Why Nano Succeeds

The nano model works because it respects mobile constraints:

  • Fits in RAM: 3.1 GB + OS + apps = under 6GB total
  • Lower bandwidth: 57% smaller = 57% less memory traffic
  • Stays cool: Reduced memory access = less heat
  • Sustained performance: No thermal throttling = consistent speed

The CPU still runs hard (clocks pinned), but critically: it doesn't overheat. The thermal envelope is manageable. This is the difference between "technically possible" and "actually usable."

The Math

Let's break down why size matters so much on mobile:

Available RAM Budget

  • Total RAM: 8GB
  • Android OS: ~2GB
  • System services: ~500MB
  • Available for apps: ~5.5GB

Stock Model (7.2 GB)

  • Needs: 7.2 GB
  • Available: 5.5 GB
  • Deficit: 1.7 GB → Constant swapping

Nano Model (3.1 GB)

  • Needs: 3.1 GB
  • Available: 5.5 GB
  • Headroom: 2.4 GB → Room to breathe

That 2.4 GB of headroom is the difference between thermal throttling and sustained performance.

Industry Implications

For Google

Gemma 4 is technically impressive, but the "mobile-ready" claim is misleading:

  • "Mobile-ready" should mean: Fast, cool, usable on typical phones
  • Google's definition: Can technically load on 8GB+ device
  • The gap: Lab benchmarks miss thermal constraints

For Users

If you have a phone with 8GB RAM and try to run stock Gemma 4:

  • Expect overheating
  • Expect performance degradation
  • Expect battery drain
  • Consider nano instead

For Developers

Building mobile AI apps? Key takeaways:

  • 4GB should be your target for models on 8GB phones
  • 3GB is better for budget phones (4-6GB RAM)
  • Test on real devices for extended sessions (30+ minutes)
  • Monitor thermals not just performance metrics

What "Mobile-Ready" Actually Means

There's a massive gap between these definitions:

Aspect Marketing Definition Reality Definition
Size "Fits on device" "Fits in RAM with headroom"
Performance "Can run" "Runs fast without throttling"
Thermals (ignored) "Stays cool for hours"
Battery (ignored) "Reasonable power draw"
User Experience "Technically works" "Pleasant to use daily"
The Bottom Line

A model isn't "mobile-ready" if it overheats your phone. Period. Real mobile readiness means sustained performance at comfortable temperatures.

Try It Yourself

You can validate these findings on your own phone:

  • Install Offline AI (supports Hugging Face models)
  • Try stock gemma4:e2b from Google
  • Try gemma4-nano:e2b from ssfdre38/gemma4-nano-gguf
  • Compare: Temperature, performance consistency, battery drain

This isn't theoretical. This is reproducible science. The industry needs to update its definition of "mobile-ready" to include thermal and power constraints.

Related: Beyond TurboQuant: The gemma4-nano Journey