- #ai
- #llm
- #research
3X LLM Inference Speedup on TPUs with Diffusion-Style Speculative Decoding
Researchers achieved a 3.13x average speedup in LLM inference on Google TPUs using DFlash, a block-diffusion speculative decoding method, now open-source and integrated into vLLM.




