aravhawk/gemma4:26b

aravhawk/ gemma4:26b

543 Downloads Updated 1 week ago

Gemma 4 26B Optimized for 16GB VRAM via Q3 Quantization

tools thinking 26b

ollama run aravhawk/gemma4:26b

curl http://localhost:11434/api/chat \
  -d '{
    "model": "aravhawk/gemma4:26b",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

from ollama import chat

response = chat(
    model='aravhawk/gemma4:26b',
    messages=[{'role': 'user', 'content': 'Hello!'}],
)
print(response.message.content)

import ollama from 'ollama'

const response = await ollama.chat({
  model: 'aravhawk/gemma4:26b',
  messages: [{role: 'user', content: 'Hello!'}],
})
console.log(response.message.content)

Details

Updated 1 week ago

1 week ago

d17674061c61 · 13GB ·

model

archgemma4

parameters25.2B

quantizationQ3_K_S

13GB

params

{ "num_ctx": 100000, "num_gpu": 99, "repeat_penalty": 1, "stop": [ "<end_of_

132B

template

{{ if .System }}<start_of_turn>user {{ .System }}<end_of_turn> {{ end }}{{ range .Messages }}{{ if e

266B

Readme

Gemma 4 26B (A4B) with an aggressive 3-bit K-quant applied

While Gemma 4 is relatively quant-resistant, expect decent quality loss compared to Q4/Q8 or FP16.
This model is quite fast due to a mixture-of-experts (MoE) architecture, achieving 132 tok/sec on an RTX 5070 Ti with context set to 100,000.

Credit to the Unsloth team for the GGUF behind this model

https://huggingface.co/unsloth/gemma-4-26B-A4B-it-GGUF