Details

Updated yesterday

yesterday

c0969deb3529 · 8.6GB ·

model

archgemma4

parameters19.9B

quantizationQ2_K

8.6GB

template

{{- if or .System .Tools }}<bos><|turn>system {{ if .System }}{{ .System }} {{ end }}{{- if .Tools }

1.3kB

params

{ "num_ctx": 256000, "repeat_last_n": 256, "repeat_penalty": 1.15, "stop": [

150B

Gemma 4 26B-A4B 98e v5-coder — code-leaning expert prune

20.8B parameters · 98 experts (30 dropped) · code-axis drop map

Research checkpoint that takes Gemma-4-26B-A4B-it and drops 30 of 128 experts per layer using a code-targeted recipe (C6 layer-relevance-weighted v4-floor, breadth=50). Same router, same attention, same norms as base — only the expert keep-set changes. Compared to v4, this one protects code/math experts more tightly per layer.

Full model card, methodology, ablations, contamination audit: ManniX-ITA/gemma-4-A4B-98e-v5-coder-it on Hugging Face.

Other formats

Format	Repo	Notes
GGUF (this repo, llama.cpp / ollama)	`ManniX-ITA/gemma-4-A4B-98e-v5-coder-it-GGUF`	Bartowski tier sweep (Q2_K → Q8_0, IQ-series) + 5 ContribDynamic CD-* per-layer quants. F16 baseline included.
NVFP4A16 (vLLM)	`ManniX-ITA/gemma-4-A4B-98e-v5-coder-NVFP4A16`	~13 GB, native vLLM, produced via `modelopt==0.43.0`.
BF16 source weights	`ManniX-ITA/gemma-4-A4B-98e-v5-coder-it`	20.8B bf16; base for any further surgery / quant.

When to use this vs. v4

Pick v5-coder for: Python / JS / Rust code generation, HumanEval / LCB workloads, MATH-500-class problems. Wins on every code bench and on MATH-500.

Pick v4 for: general-purpose chat when you don't specifically need the code lean. Reasoning / GK / instruction-following benches are flat or slightly behind on v5-coder.

Quick start

# recommended default for most setups (≈14 GB VRAM)
ollama pull mannix/gemma4-98e-v5-coder:Q4_K_M

# best quality at moderate size (≈17 GB VRAM)
ollama pull mannix/gemma4-98e-v5-coder:Q6_K

# size-conscious (≈8 GB VRAM) — minimal quality loss on code
ollama pull mannix/gemma4-98e-v5-coder:CD-Q3_K_M

CD-* variants are ContribDynamic per-layer mixed quants — expert layers get more bits, attention/norm less. Roughly 5–10% faster than the matching plain quant at similar quality on code tasks.

Scores

NVFP4A16, vLLM, greedy decoding, thinking-token budget 12 288. Apples-to-apples against v4. Full per-task settings, output-length distributions, and HumanEval contamination smell test on the HF card.

Benchmark (n)	128e ref	98e v4	98e v5-coder	Δ (v5c − v4)
HumanEval-164 chat (pass@1)	96.95	96.95	98.17	+1.22
HumanEval+-164 chat (pass@1)	92.07	91.46	92.68	+1.22
LCB-medium-55 v4 (pass@1)	87.27	78.18	85.45	+7.27
MATH-500-100 (math_verify)	89.00	89.00	92.00	+3.00
IFEval-100 (prompt_strict)	95.00	93.00	94.00	+1.00
AIME 2024 (30)	36.67	36.67	36.67	0.00
GSM8K-100 (flex)	91.00	86.00	86.00	0.00
GPQA Diamond (198, flex)	73.23	69.19	68.69	−0.50

Reading the deltas: code wins are clean (HE +1.22, HE+ +1.22, LCB-medium +7.27, MATH-500 +3.00). Reasoning / GK / instruction-following stay flat. The +7.27 on LCB is well outside the ±2pp single-run noise floor on a 55-problem bench — that's the recipe's design intent.

How v5-coder stacks against the 14–22B coder / dense field

Full comparison + caveats (especially the LCB apples-to-oranges note) on the HF card.

Model	Params	HumanEval	HumanEval+	MATH-500	GPQA-D	IFEval
98e v5-coder (this)	20.8B / 4B MoE	98.17	92.68	92.00	68.69	94.00
Phi-4	14B dense	82.6	—	80.4	56.1	63.0
Qwen2.5-14B-Instruct	14.7B dense	81.7–86.2	—	73.0	40.9	80.0
Qwen2.5-Coder-14B-Instruct	14.7B dense	89.6	87.2	—	—	—
Codestral-22B v1	22B dense	81.1	—	—	—	—
Mistral-Small-3	24B dense	~84	—	70.6	45.3	82.1

LCB across these models is not apples-to-apples — different problem subsets / time windows. See HF card for the breakdown.

Template & parameters

Uses the Gemma 4 chat template with tool-use support and a 2nd-turn workaround for nested function calls. Default parameters baked into every tag:

PARAMETER temperature 0.6
PARAMETER top_p 0.95
PARAMETER repeat_penalty 1.15
PARAMETER repeat_last_n 256
PARAMETER num_ctx 256000
PARAMETER stop <turn|>
PARAMETER stop <|tool_response>

License

Gemma Terms of Use. Use of this model implies acceptance.

v4 (multi-class CD-map, general-purpose): mannix/gemma4-98e-v4
128e original (no pruning): mannix/gemma4-98e
Project / scripts: github.com/mann1x/omnimergekit

Pruned to 98 experts gemma-4 a4b 26b v5-coder. Best 20b coder model overall