Details

Updated 20 hours ago

20 hours ago

3e4d11240551 · 65GB ·

model

archgptoss

parameters117B

quantizationBF16

65GB

template

<|start|>system<|message|>You are ChatGPT, a large language model trained by OpenAI. Knowledge cutof

7.1kB

params

{ "stop": [ "<|return|>", "<|call|>", "<|endoftext|>" ], "temper

97B

GPT-OSS 120B (Heretic)

An abliterated build of openai/gpt-oss-120b, processed with Heretic to remove refusal behavior while preserving general capability. Repackaged so it actually runs in Ollama (the straight upstream GGUF doesn’t).

TL;DR

ollama pull brianmatzelle/gpt-oss-heretic:120b

Base: gpt-oss-120b (117B params, ~5B active, MoE)
Abliteration: Heretic v1.0.1, via kldzj/gpt-oss-120b-heretic → bartowski/kldzj_gpt-oss-120b-heretic-GGUF (BF16 variant)
Context: 131,072 tokens
Tools: ✅ working (verified end-to-end)
Thinking: ✅ working
Format: BF16 attention + MXFP4 experts (matches stock gpt-oss:120b exactly; ~65 GB)

Why this exists

The bartowski/kldzj heretic GGUFs are great, but none of them work in Ollama 0.24 out of the box. Symptoms range from “does not support tools” to “panic: failed to sample token” depending on which variant you try. Diagnosing this took a few hours.

This build is the bartowski BF16 variant with the following fixes baked in:

Issue	Fix
GGUF arch keyed as `gpt-oss` (hyphen); Ollama’s sampler keys on `gptoss`	Rewrote all `gpt-oss.` metadata keys → `gptoss.`
`tokenizer.ggml.pre = gpt-4o`; Ollama expects `default`	Patched to `default`
No Modelfile `TEMPLATE` → Ollama can’t render harmony format	Copied the stock `gpt-oss:120b` Go template
Spurious `{{- end -}}` in `ollama show --modelfile` output	Stripped (Ollama bug — round-trip isn’t faithful)
GGUF only has one `eos_token_id`; harmony parser needs multiple to know when to stop	Added explicit `PARAMETER stop` for `<\\|return\\|>`, `<\\|call\\|>`, `<\\|endoftext\\|>`

If you skip any of these, you get one of: - Error: error parsing tool call: invalid character '<' after top-level value (missing stop tokens) - Error: model 'X' does not support tools (missing template) - panic: failed to sample token (arch mismatch or wrong quant variant)

Pick the right base variant

bartowski publishes the kldzj heretic in two layouts. Use BF16 if your runner is Ollama:

Variant	Attention tensors	Experts	Ollama?
`MXFP4_MOE` (~63 GB)	Q8_0	MXFP4	❌ Ollama’s gpt-oss kernel panics on Q8_0 attention
`bf16` (~65 GB)	BF16	MXFP4	✅ Matches stock `gpt-oss:120b` layout

Verifying tools work

curl -s http://localhost:11434/api/chat -d '{
  "model":"brianmatzelle/gpt-oss-heretic:120b",
  "messages":[{"role":"user","content":"What is 41 times 19? Use the calculator."}],
  "tools":[{"type":"function","function":{"name":"calculator","description":"multiplies two numbers","parameters":{"type":"object","properties":{"a":{"type":"number"},"b":{"type":"number"}},"required":["a","b"]}}}],
  "stream":false
}' | jq .message

You should get thinking + a clean tool_calls array — not a parser error.

Quality / safety notes

Heretic reports KL divergence ~0.5 for gpt-oss-120b — right at the threshold Heretic itself flags as “may indicate significant damage to original capabilities.” Tool-call accuracy is preserved in spot checks, but expect more variance than stock gpt-oss:120b on hard reasoning.
OpenAI built deep refusal behavior into gpt-oss; this build removes most of it. Treat outputs accordingly. Intended for personal research, not user-facing production.
License: Apache 2.0 (inherits from gpt-oss).

Hardware

Tested on an NVIDIA DGX Spark (GB10, 128 GB unified LPDDR5x). Sits at ~58 GB GPU weights + ~5 GB KV cache at 128k context. Should run on anything with 80+ GB of (V)RAM.

Credits

Base model: OpenAI — gpt-oss-120b
Heretic method: p-e-w
Abliterated weights: kldzj
GGUF quants: bartowski
Ollama packaging + metadata fixes: this repo