The best 20b coding model just got better! Beats the bigger 26b brother in Python and code reasoning

Details

Updated 1 month ago

1 month ago

1b88c06cd811 · 8.9GB ·

model

archgemma4

parameters19.9B

quantizationIQ3_XXS

8.9GB

template

<start_of_turn>{{- if or (eq .Role "system") (eq .Role "user") }}user {{- if (eq .Role "system") }}

585B

params

{ "min_p": 0.05, "num_ctx": 8192, "repeat_penalty": 1.1, "temperature": 0.6 }

69B

Gemma 4 20B-A4B 98e v6-coder

— LCB-targeted code prune —

20.8B params · 98 experts (30 dropped) · ~4B active · LiveCodeBench-targeted drop map

A research checkpoint that takes Gemma-4-26B-A4B-it and drops ³⁰⁄₁₂₈ experts per layer using a code-targeted recipe (C6 layer-relevance-weighted v4-floor, breadth=50), re-derived on corrected v3 code-pass calibration data and then steered specifically at LiveCodeBench-medium — the one code bench where expert pruning hurt most. Same router, attention, and norms as base, plus the mandatory shared-FFN α=1.2 upweight every coder variant carries.

Successor to v5-coder: essentially tied on HumanEval (−0.61 HE / identical HE+) while gaining +10.91pp on LCB-medium — closing the prune hole and pushing +1.81pp past the unpruned 128e.

Full model card & methodology: ManniX-ITA/gemma-4-A4B-98e-v6-coder-it on Hugging Face.

Other formats: - GGUF (full tier sweep, imatrix, all HE+-scored): ManniX-ITA/gemma-4-A4B-98e-v6-coder-it-GGUF

Scores (Q6_K, llama.cpp, greedy, EVAL_PROTOCOL v3)

HE	HE+	LCB-med-55 v4	LCB-med-100	MultiPL-E macro	MATH-500	GPQA-D	AIME	IFEval
98.78	93.29	96.36	96.00	88.00	91.00	67.17	63.33	92.00

Top of the 14–22B coder band: +9.2pp HE over Qwen2.5-Coder-14B-Instruct (89.6 → 98.78), and LCB-targeting pushed it past the unpruned 128e on every code axis.

Per-quant HE+ (164q, chat-mode pass@1, greedy)

Plain K-quants rebuilt with imatrix (calibration-data-v5) — biggest lift on the low-bit tiers (Q3_K_M 85.98 → 92.68). Q4_K_M is the one exception, built without imatrix (imatrix lowered it: 90.85 vs 92.07 plain). IQ-tiers always carry imatrix.

Quant	Size (GB)	bpw	HE+ pass@1
F16	39.79	16.00	— (baseline)
Q8_0	21.16	8.14	93.29%
Q6_K_L	17.98	6.91	92.68%
Q6_K	17.81	6.84	93.29%
CD-Q6_K	15.54	5.97	92.07%
Q5_K_L	15.25	5.86	91.46%
Q5_K_M	15.07	5.79	92.68%
Q5_K_S	14.19	5.45	92.07%
Q4_K_L	13.42	5.16	92.68%
Q4_K_M	13.24	5.09	92.07% (plain)
CD-Q5_K_M	13.07	5.03	90.85%
Q4_1	12.61	4.85	91.46%
Q4_K_S	12.21	4.69	92.68%
IQ4_NL	11.42	4.39	91.46%
Q4_0	11.42	4.39	90.85%
⭐ IQ4_XS	11.01	4.23	92.07%
Q3_K_L	10.94	4.21	92.07%
Q3_K_XL	10.69	4.11	92.07%
CD-Q4_K_M	10.65	4.10	91.46%
⭐ Q3_K_M	10.51	4.04	92.68% (value leader)
CD-IQ4_K_M	10.29	3.96	91.46%
IQ3_M	9.82	3.78	92.07%
Q3_K_S	9.68	3.72	87.80%
IQ3_XS	9.22	3.54	92.07%
IQ3_XXS	8.95	3.44	90.85%
IQ2_M	8.22	3.16	90.24%
IQ2_S	7.83	3.01	89.02%
IQ2_XS	7.77	2.99	70.73% (2-bit cliff)

Recommended: IQ4_XS (11.01 GB) is the safe 4-bit default; Q3_K_M (10.51 GB) is the smallest tier in the 92.68% top cluster (value leader); IQ3_XS (9.22 GB) the smallest on the 92.07% plateau; IQ2_S (7.83 GB) for sub-8 GB. Avoid IQ2_XS.

Head-to-head vs Qwen2.5-Coder-14B-Instruct — by file size

Same rig (single RTX 3090), same recipe (llama-server -c 32768 -ngl 99 --parallel 2, omk_eval, humanevalplus_full, greedy T=0). Qwen GGUFs are bartowski’s Qwen2.5-Coder-14B-Instruct-GGUF. Pairing by tier name is misleading — v6-coder is a 20.8B MoE, Qwen is 14.7B dense — so the fair comparison is iso-disk: at a given GB budget, which model wins HE+? v6-coder runs 1.5–3.5 bpw lower at the same disk and still scores higher at every band.

Disk band	Qwen2.5-Coder-14B (size / bpw / HE+)	v6-coder best (size / bpw / HE+)	Δ HE+
~21 GB	(none — Qwen ceiling is Q8_0 15.70)	Q8_0 21.16 / 8.14 / 93.29%	—
~18 GB	(none)	Q6_K 17.81 / 6.84 / 93.29%	new top
~15.7 GB	Q8_0 15.70 / 8.54 / 84.76%	Q5_K_M 15.07 / 5.79 / 92.68%	+7.92
~13 GB	Q6_K 12.12 / 6.60 / 84.76%	Q4_K_L 13.42 / 5.16 / 92.68%	+7.92
~12 GB	Q6_K 12.12 / 6.60 / 84.76%	Q4_K_S 12.21 / 4.69 / 92.68%	+7.92
~11 GB	Q5_K_M 10.51 / 5.72 / 83.54%	⭐ IQ4_XS 11.01 / 4.23 / 92.07%	+8.53
~10.5 GB (iso-disk)	Q5_K_M 10.51 / 5.72 / 83.54%	Q3_K_M 10.51 / 4.04 / 92.68%	+9.14
~10 GB	Q5_K_M 10.51 / 5.72 / 83.54%	⭐ IQ3_M 9.82 / 3.78 / 92.07%	+8.53
~9.2 GB	Q4_K_M 8.99 / 4.89 / 85.37%	IQ3_XS 9.22 / 3.54 / 92.07%	+6.70
~9 GB	Q4_K_M 8.99 / 4.89 / 85.37%	IQ3_XXS 8.95 / 3.44 / 90.85%	+5.48
~8 GB	IQ4_XS 8.12 / 4.42 / 84.76%	IQ2_M 8.22 / 3.16 / 90.24%	+5.48
~7.8 GB	IQ4_XS 8.12 / 4.42 / 84.76%	IQ2_S 7.83 / 3.01 / 89.02%	+4.26

Qwen sits at 83–85% HE+ across its whole ladder; the iso-disk ~10.5 GB point is the cleanest read — v6 Q3_K_M (4.04 bpw / 92.68%) vs Qwen Q5_K_M (5.72 bpw / 83.54%): +9.14pp at the exact same file size, −1.68 bpw. Same-rig LCB-medium-55 v4 on the identical split: 96.36 vs 18.18 (read as a same-rig measurement — v6-coder was specifically LCB-v4-targeted; not a general-coding ranking).

Pull

ollama pull mannix/gemma4-98e-v6-coder            # :latest = Q4_K_M
ollama pull mannix/gemma4-98e-v6-coder:IQ4_XS     # safe 4-bit (92.07%)
ollama pull mannix/gemma4-98e-v6-coder:Q3_K_M     # value leader, 10.5 GB (92.68%)
ollama pull mannix/gemma4-98e-v6-coder:Q6_K       # max fidelity (93.29%)
ollama pull mannix/gemma4-98e-v6-coder:IQ2_S      # sub-8 GB (89.02%)

Derivative of Gemma 4 — Gemma Terms of Use.