Updated 19 hours ago
ollama run brianmatzelle/gpt-oss-heretic:120b
ollama launch claude --model brianmatzelle/gpt-oss-heretic:120b
ollama launch codex-app --model brianmatzelle/gpt-oss-heretic:120b
ollama launch openclaw --model brianmatzelle/gpt-oss-heretic:120b
ollama launch hermes --model brianmatzelle/gpt-oss-heretic:120b
ollama launch codex --model brianmatzelle/gpt-oss-heretic:120b
ollama launch opencode --model brianmatzelle/gpt-oss-heretic:120b
An abliterated build of openai/gpt-oss-120b, processed with Heretic to remove refusal behavior while preserving general capability. Repackaged so it actually runs in Ollama (the straight upstream GGUF doesn’t).
ollama pull brianmatzelle/gpt-oss-heretic:120b
gpt-oss:120b exactly; ~65 GB)The bartowski/kldzj heretic GGUFs are great, but none of them work in Ollama 0.24 out of the box. Symptoms range from “does not support tools” to “panic: failed to sample token” depending on which variant you try. Diagnosing this took a few hours.
This build is the bartowski BF16 variant with the following fixes baked in:
| Issue | Fix |
|---|---|
GGUF arch keyed as gpt-oss (hyphen); Ollama’s sampler keys on gptoss |
Rewrote all gpt-oss.* metadata keys → gptoss.* |
tokenizer.ggml.pre = gpt-4o; Ollama expects default |
Patched to default |
No Modelfile TEMPLATE → Ollama can’t render harmony format |
Copied the stock gpt-oss:120b Go template |
Spurious {{- end -}} in ollama show --modelfile output |
Stripped (Ollama bug — round-trip isn’t faithful) |
GGUF only has one eos_token_id; harmony parser needs multiple to know when to stop |
Added explicit PARAMETER stop for <\|return\|>, <\|call\|>, <\|endoftext\|> |
If you skip any of these, you get one of:
- Error: error parsing tool call: invalid character '<' after top-level value (missing stop tokens)
- Error: model 'X' does not support tools (missing template)
- panic: failed to sample token (arch mismatch or wrong quant variant)
bartowski publishes the kldzj heretic in two layouts. Use BF16 if your runner is Ollama:
| Variant | Attention tensors | Experts | Ollama? |
|---|---|---|---|
MXFP4_MOE (~63 GB) |
Q8_0 | MXFP4 | ❌ Ollama’s gpt-oss kernel panics on Q8_0 attention |
bf16 (~65 GB) |
BF16 | MXFP4 | ✅ Matches stock gpt-oss:120b layout |
curl -s http://localhost:11434/api/chat -d '{
"model":"brianmatzelle/gpt-oss-heretic:120b",
"messages":[{"role":"user","content":"What is 41 times 19? Use the calculator."}],
"tools":[{"type":"function","function":{"name":"calculator","description":"multiplies two numbers","parameters":{"type":"object","properties":{"a":{"type":"number"},"b":{"type":"number"}},"required":["a","b"]}}}],
"stream":false
}' | jq .message
You should get thinking + a clean tool_calls array — not a parser error.
gpt-oss:120b on hard reasoning.Tested on an NVIDIA DGX Spark (GB10, 128 GB unified LPDDR5x). Sits at ~58 GB GPU weights + ~5 GB KV cache at 128k context. Should run on anything with 80+ GB of (V)RAM.