36 14 hours ago

The best 20b coding model just got better! Beats the bigger 26b brother in Python and code reasoning

tools thinking
ollama run mannix/gemma4-98e-v6-coder:IQ4_NL

Details

yesterday

c8cc3eccb9ad · 11GB ·

gemma4
·
19.9B
·
IQ4_NL
<start_of_turn>{{- if or (eq .Role "system") (eq .Role "user") }}user {{- if (eq .Role "system") }}

Readme

Gemma 4 26B-A4B 98e v6-coder

— LCB-targeted code prune —

20.8B params · 98 experts (30 dropped) · ~4B active · LiveCodeBench-targeted drop map

A research checkpoint that takes Gemma-4-26B-A4B-it and drops 30128 experts per layer using a code-targeted recipe (C6 layer-relevance-weighted v4-floor, breadth=50), re-derived on corrected v3 code-pass calibration data and then steered specifically at LiveCodeBench-medium — the one code bench where expert pruning hurt most. Same router, attention, and norms as base, plus the mandatory shared-FFN α=1.2 upweight every coder variant carries.

Successor to v5-coder: essentially tied on HumanEval (−0.61 HE / identical HE+) while gaining +10.91pp on LCB-medium — closing the prune hole and pushing +1.81pp past the unpruned 128e.

Full model card & methodology: ManniX-ITA/gemma-4-A4B-98e-v6-coder-it on Hugging Face.

Other formats: - GGUF (full tier sweep, imatrix, all HE+-scored): ManniX-ITA/gemma-4-A4B-98e-v6-coder-it-GGUF

Scores (Q6_K, llama.cpp, greedy, EVAL_PROTOCOL v3)

HEHE+LCB-med-55 v4LCB-med-100MultiPL-E macroMATH-500GPQA-DAIMEIFEval
98.7893.2996.3696.0088.0091.0067.1763.3392.00

Top of the 14–22B coder band: +9.2pp HE over Qwen2.5-Coder-14B-Instruct (89.6 → 98.78), and LCB-targeting pushed it past the unpruned 128e on every code axis.

Per-quant HE+ (164q, chat-mode pass@1, greedy)

Plain K-quants rebuilt with imatrix (calibration-data-v5) — biggest lift on the low-bit tiers (Q3_K_M 85.98 → 92.68). Q4_K_M is the one exception, built without imatrix (imatrix lowered it: 90.85 vs 92.07 plain). IQ-tiers always carry imatrix.

QuantSize (GB)bpwHE+ pass@1
F1639.7916.00— (baseline)
Q8_021.168.1493.29%
Q6_K_L17.986.9192.68%
Q6_K17.816.8493.29%
CD-Q6_K15.545.9792.07%
Q5_K_L15.255.8691.46%
Q5_K_M15.075.7992.68%
Q5_K_S14.195.4592.07%
Q4_K_L13.425.1692.68%
Q4_K_M13.245.0992.07% (plain)
CD-Q5_K_M13.075.0390.85%
Q4_112.614.8591.46%
Q4_K_S12.214.6992.68%
IQ4_NL11.424.3991.46%
Q4_011.424.3990.85%
IQ4_XS11.014.2392.07%
Q3_K_L10.944.2192.07%
Q3_K_XL10.694.1192.07%
CD-Q4_K_M10.654.1091.46%
Q3_K_M10.514.0492.68% (value leader)
CD-IQ4_K_M10.293.9691.46%
IQ3_M9.823.7892.07%
Q3_K_S9.683.7287.80%
IQ3_XS9.223.5492.07%
IQ3_XXS8.953.4490.85%
IQ2_M8.223.1690.24%
IQ2_S7.833.0189.02%
IQ2_XS7.772.9970.73% (2-bit cliff)

Recommended: IQ4_XS (11.01 GB) is the safe 4-bit default; Q3_K_M (10.51 GB) is the smallest tier in the 92.68% top cluster (value leader); IQ3_XS (9.22 GB) the smallest on the 92.07% plateau; IQ2_S (7.83 GB) for sub-8 GB. Avoid IQ2_XS.

Head-to-head vs Qwen2.5-Coder-14B-Instruct — by file size

Same rig (single RTX 3090), same recipe (llama-server -c 32768 -ngl 99 --parallel 2, omk_eval, humanevalplus_full, greedy T=0). Qwen GGUFs are bartowski’s Qwen2.5-Coder-14B-Instruct-GGUF. Pairing by tier name is misleading — v6-coder is a 20.8B MoE, Qwen is 14.7B dense — so the fair comparison is iso-disk: at a given GB budget, which model wins HE+? v6-coder runs 1.5–3.5 bpw lower at the same disk and still scores higher at every band.

Disk bandQwen2.5-Coder-14B (size / bpw / HE+)v6-coder best (size / bpw / HE+)Δ HE+
~21 GB(none — Qwen ceiling is Q8_0 15.70)Q8_0 21.16 / 8.14 / 93.29%
~18 GB(none)Q6_K 17.81 / 6.84 / 93.29%new top
~15.7 GBQ8_0 15.70 / 8.54 / 84.76%Q5_K_M 15.07 / 5.79 / 92.68%+7.92
~13 GBQ6_K 12.12 / 6.60 / 84.76%Q4_K_L 13.42 / 5.16 / 92.68%+7.92
~12 GBQ6_K 12.12 / 6.60 / 84.76%Q4_K_S 12.21 / 4.69 / 92.68%+7.92
~11 GBQ5_K_M 10.51 / 5.72 / 83.54%IQ4_XS 11.01 / 4.23 / 92.07%+8.53
~10.5 GB (iso-disk)Q5_K_M 10.51 / 5.72 / 83.54%Q3_K_M 10.51 / 4.04 / 92.68%+9.14
~10 GBQ5_K_M 10.51 / 5.72 / 83.54%IQ3_M 9.82 / 3.78 / 92.07%+8.53
~9.2 GBQ4_K_M 8.99 / 4.89 / 85.37%IQ3_XS 9.22 / 3.54 / 92.07%+6.70
~9 GBQ4_K_M 8.99 / 4.89 / 85.37%IQ3_XXS 8.95 / 3.44 / 90.85%+5.48
~8 GBIQ4_XS 8.12 / 4.42 / 84.76%IQ2_M 8.22 / 3.16 / 90.24%+5.48
~7.8 GBIQ4_XS 8.12 / 4.42 / 84.76%IQ2_S 7.83 / 3.01 / 89.02%+4.26

Qwen sits at 83–85% HE+ across its whole ladder; the iso-disk ~10.5 GB point is the cleanest read — v6 Q3_K_M (4.04 bpw / 92.68%) vs Qwen Q5_K_M (5.72 bpw / 83.54%): +9.14pp at the exact same file size, −1.68 bpw. Same-rig LCB-medium-55 v4 on the identical split: 96.36 vs 18.18 (read as a same-rig measurement — v6-coder was specifically LCB-v4-targeted; not a general-coding ranking).

Pull

ollama pull mannix/gemma4-98e-v6-coder            # :latest = Q4_K_M
ollama pull mannix/gemma4-98e-v6-coder:IQ4_XS     # safe 4-bit (92.07%)
ollama pull mannix/gemma4-98e-v6-coder:Q3_K_M     # value leader, 10.5 GB (92.68%)
ollama pull mannix/gemma4-98e-v6-coder:Q6_K       # max fidelity (93.29%)
ollama pull mannix/gemma4-98e-v6-coder:IQ2_S      # sub-8 GB (89.02%)

Derivative of Gemma 4 — Gemma Terms of Use.