543 1 week ago

Gemma 4 26B Optimized for 16GB VRAM via Q3 Quantization

tools thinking 26b
ollama run aravhawk/gemma4:26b

Details

1 week ago

d17674061c61 · 13GB ·

gemma4
·
25.2B
·
Q3_K_S
{ "num_ctx": 100000, "num_gpu": 99, "repeat_penalty": 1, "stop": [ "<end_of_
{{ if .System }}<start_of_turn>user {{ .System }}<end_of_turn> {{ end }}{{ range .Messages }}{{ if e

Readme

Gemma 4 26B (A4B) with an aggressive 3-bit K-quant applied

  • While Gemma 4 is relatively quant-resistant, expect decent quality loss compared to Q4/Q8 or FP16.
  • This model is quite fast due to a mixture-of-experts (MoE) architecture, achieving 132 tok/sec on an RTX 5070 Ti with context set to 100,000.

Credit to the Unsloth team for the GGUF behind this model