Google releases DiffusionGemma, an open model that generates text up to 4x faster

Google's experimental open model swaps word-by-word generation for parallel diffusion, reaching 1,000-plus tokens a second on a single GPU.

Written byH Hillary

Read time1 min

UpdatedJul 22, 2026

Filed underNotes · News

0:00

Google releases DiffusionGemma, an open model that generates text up to 4x faster

June 10, 2026

Google has released DiffusionGemma, an open-weight language model that generates text by refining whole blocks of tokens at once rather than one word at a time, and which the company said runs up to four times faster than conventional models on the same hardware. The weights are available on Hugging Face under an Apache 2.0 licence.

The model departs from the autoregressive design used by most large language models, which predict text one token after the next. According to Google, DiffusionGemma drafts a block of 256 tokens and revises it over several passes, adapting the denoising technique used in image generation. The company reported throughput above 1,000 tokens per second on a single Nvidia H100 and above 700 on a consumer RTX 5090.

DiffusionGemma is a mixture-of-experts model with 26 billion total parameters, of which Google said about 3.8 billion are active at each step. The company said the model fits within 18GB of memory when quantised, within range of high-end consumer graphics cards.

Google was explicit about the cost of the approach. It said output quality is lower than its standard Gemma models, and described DiffusionGemma as experimental, better suited to latency-sensitive tasks such as inline editing and code completion than as a general-purpose model.

The release adds to a small body of research on text diffusion, an approach that has drawn interest as a route to faster generation but has not yet matched autoregressive systems on quality at scale. Because the model ships under a permissive licence that allows commercial use, outside researchers can now test those speed and quality claims independently, as early technical write-ups have begun to do.

More from notes.

Does your AI phone home? Our AI telemetry investigations

AI privacy and data control: where StrideNote stands

AI for beginners: a new hands-on course from StrideNote