37 Downloads Updated 14 hours ago
ollama run mannix/gemma4-98e-v6-coder:IQ3_XXS
— LCB-targeted code prune —
20.8B params · 98 experts (30 dropped) · ~4B active · LiveCodeBench-targeted drop map
A research checkpoint that takes Gemma-4-26B-A4B-it and drops 30⁄128 experts per layer using a code-targeted recipe (C6 layer-relevance-weighted v4-floor, breadth=50), re-derived on corrected v3 code-pass calibration data and then steered specifically at LiveCodeBench-medium — the one code bench where expert pruning hurt most. Same router, attention, and norms as base, plus the mandatory shared-FFN α=1.2 upweight every coder variant carries.
Successor to v5-coder: essentially tied on HumanEval (−0.61 HE / identical HE+) while gaining +10.91pp on LCB-medium — closing the prune hole and pushing +1.81pp past the unpruned 128e.
Full model card & methodology: ManniX-ITA/gemma-4-A4B-98e-v6-coder-it on Hugging Face.
Other formats:
- GGUF (full tier sweep, imatrix, all HE+-scored): ManniX-ITA/gemma-4-A4B-98e-v6-coder-it-GGUF
| HE | HE+ | LCB-med-55 v4 | LCB-med-100 | MultiPL-E macro | MATH-500 | GPQA-D | AIME | IFEval |
|---|---|---|---|---|---|---|---|---|
| 98.78 | 93.29 | 96.36 | 96.00 | 88.00 | 91.00 | 67.17 | 63.33 | 92.00 |
Top of the 14–22B coder band: +9.2pp HE over Qwen2.5-Coder-14B-Instruct (89.6 → 98.78), and LCB-targeting pushed it past the unpruned 128e on every code axis.
Plain K-quants rebuilt with imatrix (calibration-data-v5) — biggest lift on the low-bit tiers (Q3_K_M 85.98 → 92.68). Q4_K_M is the one exception, built without imatrix (imatrix lowered it: 90.85 vs 92.07 plain). IQ-tiers always carry imatrix.
| Quant | Size (GB) | bpw | HE+ pass@1 |
|---|---|---|---|
| F16 | 39.79 | 16.00 | — (baseline) |
| Q8_0 | 21.16 | 8.14 | 93.29% |
| Q6_K_L | 17.98 | 6.91 | 92.68% |
| Q6_K | 17.81 | 6.84 | 93.29% |
| CD-Q6_K | 15.54 | 5.97 | 92.07% |
| Q5_K_L | 15.25 | 5.86 | 91.46% |
| Q5_K_M | 15.07 | 5.79 | 92.68% |
| Q5_K_S | 14.19 | 5.45 | 92.07% |
| Q4_K_L | 13.42 | 5.16 | 92.68% |
| Q4_K_M | 13.24 | 5.09 | 92.07% (plain) |
| CD-Q5_K_M | 13.07 | 5.03 | 90.85% |
| Q4_1 | 12.61 | 4.85 | 91.46% |
| Q4_K_S | 12.21 | 4.69 | 92.68% |
| IQ4_NL | 11.42 | 4.39 | 91.46% |
| Q4_0 | 11.42 | 4.39 | 90.85% |
| ⭐ IQ4_XS | 11.01 | 4.23 | 92.07% |
| Q3_K_L | 10.94 | 4.21 | 92.07% |
| Q3_K_XL | 10.69 | 4.11 | 92.07% |
| CD-Q4_K_M | 10.65 | 4.10 | 91.46% |
| ⭐ Q3_K_M | 10.51 | 4.04 | 92.68% (value leader) |
| CD-IQ4_K_M | 10.29 | 3.96 | 91.46% |
| IQ3_M | 9.82 | 3.78 | 92.07% |
| Q3_K_S | 9.68 | 3.72 | 87.80% |
| IQ3_XS | 9.22 | 3.54 | 92.07% |
| IQ3_XXS | 8.95 | 3.44 | 90.85% |
| IQ2_M | 8.22 | 3.16 | 90.24% |
| IQ2_S | 7.83 | 3.01 | 89.02% |
| IQ2_XS | 7.77 | 2.99 | 70.73% (2-bit cliff) |
Recommended: IQ4_XS (11.01 GB) is the safe 4-bit default; Q3_K_M (10.51 GB) is the smallest tier in the 92.68% top cluster (value leader); IQ3_XS (9.22 GB) the smallest on the 92.07% plateau; IQ2_S (7.83 GB) for sub-8 GB. Avoid IQ2_XS.
Same rig (single RTX 3090), same recipe (llama-server -c 32768 -ngl 99 --parallel 2, omk_eval, humanevalplus_full, greedy T=0). Qwen GGUFs are bartowski’s Qwen2.5-Coder-14B-Instruct-GGUF. Pairing by tier name is misleading — v6-coder is a 20.8B MoE, Qwen is 14.7B dense — so the fair comparison is iso-disk: at a given GB budget, which model wins HE+? v6-coder runs 1.5–3.5 bpw lower at the same disk and still scores higher at every band.
| Disk band | Qwen2.5-Coder-14B (size / bpw / HE+) | v6-coder best (size / bpw / HE+) | Δ HE+ |
|---|---|---|---|
| ~21 GB | (none — Qwen ceiling is Q8_0 15.70) | Q8_0 21.16 / 8.14 / 93.29% | — |
| ~18 GB | (none) | Q6_K 17.81 / 6.84 / 93.29% | new top |
| ~15.7 GB | Q8_0 15.70 / 8.54 / 84.76% | Q5_K_M 15.07 / 5.79 / 92.68% | +7.92 |
| ~13 GB | Q6_K 12.12 / 6.60 / 84.76% | Q4_K_L 13.42 / 5.16 / 92.68% | +7.92 |
| ~12 GB | Q6_K 12.12 / 6.60 / 84.76% | Q4_K_S 12.21 / 4.69 / 92.68% | +7.92 |
| ~11 GB | Q5_K_M 10.51 / 5.72 / 83.54% | ⭐ IQ4_XS 11.01 / 4.23 / 92.07% | +8.53 |
| ~10.5 GB (iso-disk) | Q5_K_M 10.51 / 5.72 / 83.54% | Q3_K_M 10.51 / 4.04 / 92.68% | +9.14 |
| ~10 GB | Q5_K_M 10.51 / 5.72 / 83.54% | ⭐ IQ3_M 9.82 / 3.78 / 92.07% | +8.53 |
| ~9.2 GB | Q4_K_M 8.99 / 4.89 / 85.37% | IQ3_XS 9.22 / 3.54 / 92.07% | +6.70 |
| ~9 GB | Q4_K_M 8.99 / 4.89 / 85.37% | IQ3_XXS 8.95 / 3.44 / 90.85% | +5.48 |
| ~8 GB | IQ4_XS 8.12 / 4.42 / 84.76% | IQ2_M 8.22 / 3.16 / 90.24% | +5.48 |
| ~7.8 GB | IQ4_XS 8.12 / 4.42 / 84.76% | IQ2_S 7.83 / 3.01 / 89.02% | +4.26 |
Qwen sits at 83–85% HE+ across its whole ladder; the iso-disk ~10.5 GB point is the cleanest read — v6 Q3_K_M (4.04 bpw / 92.68%) vs Qwen Q5_K_M (5.72 bpw / 83.54%): +9.14pp at the exact same file size, −1.68 bpw. Same-rig LCB-medium-55 v4 on the identical split: 96.36 vs 18.18 (read as a same-rig measurement — v6-coder was specifically LCB-v4-targeted; not a general-coding ranking).
ollama pull mannix/gemma4-98e-v6-coder # :latest = Q4_K_M
ollama pull mannix/gemma4-98e-v6-coder:IQ4_XS # safe 4-bit (92.07%)
ollama pull mannix/gemma4-98e-v6-coder:Q3_K_M # value leader, 10.5 GB (92.68%)
ollama pull mannix/gemma4-98e-v6-coder:Q6_K # max fidelity (93.29%)
ollama pull mannix/gemma4-98e-v6-coder:IQ2_S # sub-8 GB (89.02%)
Derivative of Gemma 4 — Gemma Terms of Use.