542 1 week ago

Gemma 4 26B Optimized for 16GB VRAM via Q3 Quantization

tools thinking 26b
ollama run aravhawk/gemma4:26b

Applications

Claude Code
Claude Code ollama launch claude --model aravhawk/gemma4:26b
Codex
Codex ollama launch codex --model aravhawk/gemma4:26b
OpenCode
OpenCode ollama launch opencode --model aravhawk/gemma4:26b
OpenClaw
OpenClaw ollama launch openclaw --model aravhawk/gemma4:26b
Hermes Agent
Hermes Agent ollama launch hermes --model aravhawk/gemma4:26b

Models

View all →

2 models

gemma4:26b

13GB · 256K context window · Text · 1 week ago

Readme

Gemma 4 26B (A4B) with an aggressive 3-bit K-quant applied

  • While Gemma 4 is relatively quant-resistant, expect decent quality loss compared to Q4/Q8 or FP16.
  • This model is quite fast due to a mixture-of-experts (MoE) architecture, achieving 132 tok/sec on an RTX 5070 Ti with context set to 100,000.

Credit to the Unsloth team for the GGUF behind this model