69 6 hours ago

Pruned to 98 experts gemma-4 a4b 26b v5-coder. Best 20b coder model overall

tools thinking
ollama run mannix/gemma4-98e-v5-coder:IQ2_M

Details

yesterday

f08f1d7ba33c · 8.2GB ·

gemma4
·
19.9B
·
IQ2_M
{{- if or .System .Tools }}<bos><|turn>system {{ if .System }}{{ .System }} {{ end }}{{- if .Tools }
{ "num_ctx": 256000, "repeat_last_n": 256, "repeat_penalty": 1.15, "stop": [

Readme

Gemma 4 26B-A4B 98e v5-coder — code-leaning expert prune

20.8B parameters · 98 experts (30 dropped) · code-axis drop map

Research checkpoint that takes Gemma-4-26B-A4B-it and drops 30 of 128 experts per layer using a code-targeted recipe (C6 layer-relevance-weighted v4-floor, breadth=50). Same router, same attention, same norms as base — only the expert keep-set changes. Compared to v4, this one protects code/math experts more tightly per layer.

Full model card, methodology, ablations, contamination audit: ManniX-ITA/gemma-4-A4B-98e-v5-coder-it on Hugging Face.

Other formats

FormatRepoNotes
GGUF (this repo, llama.cpp / ollama) ManniX-ITA/gemma-4-A4B-98e-v5-coder-it-GGUF Bartowski tier sweep (Q2_K → Q8_0, IQ-series) + 5 ContribDynamic CD-* per-layer quants. F16 baseline included.
NVFP4A16 (vLLM) ManniX-ITA/gemma-4-A4B-98e-v5-coder-NVFP4A16 ~13 GB, native vLLM, produced via modelopt==0.43.0.
BF16 source weights ManniX-ITA/gemma-4-A4B-98e-v5-coder-it 20.8B bf16; base for any further surgery / quant.

When to use this vs. v4

Pick v5-coder for: Python / JS / Rust code generation, HumanEval / LCB workloads, MATH-500-class problems. Wins on every code bench and on MATH-500.

Pick v4 for: general-purpose chat when you don't specifically need the code lean. Reasoning / GK / instruction-following benches are flat or slightly behind on v5-coder.

Quick start

# recommended default for most setups (≈14 GB VRAM)
ollama pull mannix/gemma4-98e-v5-coder:Q4_K_M

# best quality at moderate size (≈17 GB VRAM)
ollama pull mannix/gemma4-98e-v5-coder:Q6_K

# size-conscious (≈8 GB VRAM) — minimal quality loss on code
ollama pull mannix/gemma4-98e-v5-coder:CD-Q3_K_M

CD-* variants are ContribDynamic per-layer mixed quants — expert layers get more bits, attention/norm less. Roughly 5–10% faster than the matching plain quant at similar quality on code tasks.

Scores

NVFP4A16, vLLM, greedy decoding, thinking-token budget 12 288. Apples-to-apples against v4. Full per-task settings, output-length distributions, and HumanEval contamination smell test on the HF card.

Benchmark (n) 128e ref 98e v4 98e v5-coder Δ (v5c − v4)
HumanEval-164 chat (pass@1) 96.95 96.95 98.17 +1.22
HumanEval+-164 chat (pass@1) 92.07 91.46 92.68 +1.22
LCB-medium-55 v4 (pass@1) 87.27 78.18 85.45 +7.27
MATH-500-100 (math_verify) 89.00 89.00 92.00 +3.00
IFEval-100 (prompt_strict) 95.00 93.00 94.00 +1.00
AIME 2024 (30) 36.67 36.67 36.67 0.00
GSM8K-100 (flex) 91.00 86.00 86.00 0.00
GPQA Diamond (198, flex) 73.23 69.19 68.69 −0.50

Reading the deltas: code wins are clean (HE +1.22, HE+ +1.22, LCB-medium +7.27, MATH-500 +3.00). Reasoning / GK / instruction-following stay flat. The +7.27 on LCB is well outside the ±2pp single-run noise floor on a 55-problem bench — that's the recipe's design intent.

How v5-coder stacks against the 14–22B coder / dense field

Full comparison + caveats (especially the LCB apples-to-oranges note) on the HF card.

Model Params HumanEval HumanEval+ MATH-500 GPQA-D IFEval
98e v5-coder (this) 20.8B / 4B MoE 98.17 92.68 92.00 68.69 94.00
Phi-4 14B dense 82.6 80.4 56.1 63.0
Qwen2.5-14B-Instruct 14.7B dense 81.7–86.2 73.0 40.9 80.0
Qwen2.5-Coder-14B-Instruct 14.7B dense 89.6 87.2
Codestral-22B v1 22B dense 81.1
Mistral-Small-3 24B dense ~84 70.6 45.3 82.1

LCB across these models is not apples-to-apples — different problem subsets / time windows. See HF card for the breakdown.

Template & parameters

Uses the Gemma 4 chat template with tool-use support and a 2nd-turn workaround for nested function calls. Default parameters baked into every tag:

PARAMETER temperature 0.6
PARAMETER top_p 0.95
PARAMETER repeat_penalty 1.15
PARAMETER repeat_last_n 256
PARAMETER num_ctx 256000
PARAMETER stop <turn|>
PARAMETER stop <|tool_response>

License

Gemma Terms of Use. Use of this model implies acceptance.

Related