Back to Daily Feed 
LiteRT-LM: Blazing Fast On-Device GenAI with WebGPU Support
Must Read
Originally published on Google Developers Blog – AI
View Original Article
Share this article:

Summary & Key Takeaways
- LiteRT-LM optimizes on-device GenAI for Gemma 4.
- It offers memory-efficient dynamic loading and Multi-Token Prediction for speed.
- Advanced orchestration tools like Thinking Mode are included.
- New native Swift APIs and WebGPU-accelerated JavaScript APIs are introduced.
- This enables high-performance, serverless browser inference.
Our Commentary
Blazing fast on-device GenAI with LiteRT-LM, and crucially, WebGPU-accelerated JavaScript APIs? This is huge for bringing powerful AI directly into the browser without server roundtrips. Multi-Token Prediction and Thinking Mode sound like clever optimizations. This is the kind of infrastructure that makes the "agentic web" feel more tangible.
View Original Article
Share this article: