20 hours ago

tools thinking 120b
ollama run brianmatzelle/gpt-oss-heretic:120b

Details

20 hours ago

3e4d11240551 · 65GB ·

gptoss
·
117B
·
BF16
<|start|>system<|message|>You are ChatGPT, a large language model trained by OpenAI. Knowledge cutof
{ "stop": [ "<|return|>", "<|call|>", "<|endoftext|>" ], "temper

Readme

GPT-OSS 120B (Heretic)

An abliterated build of openai/gpt-oss-120b, processed with Heretic to remove refusal behavior while preserving general capability. Repackaged so it actually runs in Ollama (the straight upstream GGUF doesn’t).

TL;DR

ollama pull brianmatzelle/gpt-oss-heretic:120b
  • Base: gpt-oss-120b (117B params, ~5B active, MoE)
  • Abliteration: Heretic v1.0.1, via kldzj/gpt-oss-120b-hereticbartowski/kldzj_gpt-oss-120b-heretic-GGUF (BF16 variant)
  • Context: 131,072 tokens
  • Tools: ✅ working (verified end-to-end)
  • Thinking: ✅ working
  • Format: BF16 attention + MXFP4 experts (matches stock gpt-oss:120b exactly; ~65 GB)

Why this exists

The bartowski/kldzj heretic GGUFs are great, but none of them work in Ollama 0.24 out of the box. Symptoms range from “does not support tools” to “panic: failed to sample token” depending on which variant you try. Diagnosing this took a few hours.

This build is the bartowski BF16 variant with the following fixes baked in:

Issue Fix
GGUF arch keyed as gpt-oss (hyphen); Ollama’s sampler keys on gptoss Rewrote all gpt-oss.* metadata keys → gptoss.*
tokenizer.ggml.pre = gpt-4o; Ollama expects default Patched to default
No Modelfile TEMPLATE → Ollama can’t render harmony format Copied the stock gpt-oss:120b Go template
Spurious {{- end -}} in ollama show --modelfile output Stripped (Ollama bug — round-trip isn’t faithful)
GGUF only has one eos_token_id; harmony parser needs multiple to know when to stop Added explicit PARAMETER stop for <\|return\|>, <\|call\|>, <\|endoftext\|>

If you skip any of these, you get one of: - Error: error parsing tool call: invalid character '<' after top-level value (missing stop tokens) - Error: model 'X' does not support tools (missing template) - panic: failed to sample token (arch mismatch or wrong quant variant)

Pick the right base variant

bartowski publishes the kldzj heretic in two layouts. Use BF16 if your runner is Ollama:

Variant Attention tensors Experts Ollama?
MXFP4_MOE (~63 GB) Q8_0 MXFP4 ❌ Ollama’s gpt-oss kernel panics on Q8_0 attention
bf16 (~65 GB) BF16 MXFP4 ✅ Matches stock gpt-oss:120b layout

Verifying tools work

curl -s http://localhost:11434/api/chat -d '{
  "model":"brianmatzelle/gpt-oss-heretic:120b",
  "messages":[{"role":"user","content":"What is 41 times 19? Use the calculator."}],
  "tools":[{"type":"function","function":{"name":"calculator","description":"multiplies two numbers","parameters":{"type":"object","properties":{"a":{"type":"number"},"b":{"type":"number"}},"required":["a","b"]}}}],
  "stream":false
}' | jq .message

You should get thinking + a clean tool_calls array — not a parser error.

Quality / safety notes

  • Heretic reports KL divergence ~0.5 for gpt-oss-120b — right at the threshold Heretic itself flags as “may indicate significant damage to original capabilities.” Tool-call accuracy is preserved in spot checks, but expect more variance than stock gpt-oss:120b on hard reasoning.
  • OpenAI built deep refusal behavior into gpt-oss; this build removes most of it. Treat outputs accordingly. Intended for personal research, not user-facing production.
  • License: Apache 2.0 (inherits from gpt-oss).

Hardware

Tested on an NVIDIA DGX Spark (GB10, 128 GB unified LPDDR5x). Sits at ~58 GB GPU weights + ~5 GB KV cache at 128k context. Should run on anything with 80+ GB of (V)RAM.

Credits

  • Base model: OpenAI — gpt-oss-120b
  • Heretic method: p-e-w
  • Abliterated weights: kldzj
  • GGUF quants: bartowski
  • Ollama packaging + metadata fixes: this repo