16.3K 21 minutes ago

Gemma 4 Turbo is an optimized version of Google's Gemma 4 (9B) model, achieving 51% faster CPU inference through int4 quantization and performance tuning. Ideal for local AI assistants, tool calling, and chat applications on Windows systems without GPU.

vision tools thinking audio e2b e4b 26b 31b
17eff2f85b7f · 330B
Gemma 4 Turbo e2b is the compact edition of the Gemma 4 Turbo family — optimized for machines with limited RAM (8GB+). Achieves maximum tokens-per-second on CPU through int4 quantization, KV cache quantization (Stage 1 TurboQuant), and turboquant inference tuning. Ideal for lightweight chat, tool calling, and edge deployments.