192 Downloads Updated 5 hours ago
ollama run mannix/gemma4-98e-v7-coderx:vision-Q3_K_S
20.8B params · 98 experts (30 dropped) · ~4B active · code-maximal drop map
A research checkpoint that takes Gemma-4-26B-A4B-it and drops 30⁄128 experts per layer using a code-maximal recipe on the rebuilt v7 competence maps (audited producers, 10 classes) — generic-code 3× + LiveCodeBench-medium 2× on a [24,40] per-layer floor, with no science or multilingual targeting. Same router, attention, and norms as base, plus the mandatory shared-FFN α=1.2 upweight every coder variant carries.
The strongest coder in the cohort: it spends its whole prune budget on code and lands LiveCodeBench-medium-55 at 98.18% and LCB-100 at 99.0% — the highest of any Gemma-4 prune to date, +1.8pp / +2.0pp past the unpruned 128e (96.36 / 97.0). The trade is graduate science (GPQA 48.48). If you need the science back without giving up the code profile, use the sibling v7-coder (GPQA 70.71, LCB-55 96.36).
Full model card & methodology: ManniX-ITA/gemma-4-A4B-98e-v7-coderx-it on Hugging Face.
Other formats:
- GGUF (29 tiers, imatrix, CD-* per-layer mixes + F16 + mmproj): ManniX-ITA/gemma-4-A4B-98e-v7-coderx-it-GGUF
- NVFP4A16 (native vLLM, ~13 GB): ManniX-ITA/gemma-4-A4B-98e-v7-coderx-NVFP4A16
| LCB-55 | LCB-100 | MultiPL-E | HE | HE+ | IFEval | GSM8K | MATH-500 | AIME | ARC | GPQA-D |
|---|---|---|---|---|---|---|---|---|---|---|
| 98.18 | 99.00 | 90.00 | 95.73 | 92.68 | 95.00 | 91.00 | 89.00 | 70.00 | 94.28 | 48.48 |
Reference columns on the same Q6_K run: unpruned 128e LCB-55 96.36 / LCB-100 97.00 / MultiPL-E 90.00; v6-coder LCB-55 92.73 / LCB-100 94.00. v7-coderx tops the cohort on every code/instruction axis; the budget is paid almost entirely on graduate science (GPQA 48.48, vs 128e 67.17).
Per-tier HE+ was not swept for the v7 cohort — the Q6_K table above is the reference; quality below ~Q3 / 3-bit degrades on the Gemma 4 MoE, so prefer Q4_K_M or higher for production. Full 29-tier list (incl. CD-* per-layer mixes) is on the GGUF repo.
| Tier | Size (GB) | bpw | Role |
|---|---|---|---|
| Q8_0 | 21.16 | 8.14 | near-lossless |
| Q6_K | 17.81 | 6.84 | max fidelity (bench tier) |
| Q5_K_M | 15.07 | 5.79 | high quality |
| Q4_K_M | 13.24 | 5.09 | :latest — recommended default |
| Q4_K_S | 12.21 | 4.69 | compact 4-bit |
| ⭐ IQ4_XS | 11.01 | 4.23 | safe sub-12 GB 4-bit |
| Q3_K_M | 10.51 | 4.04 | smallest comfortable |
| IQ3_XS | 9.22 | 3.54 | sub-10 GB |
| IQ2_M | 8.22 | 3.16 | sub-8.5 GB (degraded) |
| IQ2_S | 7.83 | 3.01 | smallest viable |
ollama pull mannix/gemma4-98e-v7-coderx # :latest = Q4_K_M
ollama pull mannix/gemma4-98e-v7-coderx:IQ4_XS # safe 4-bit, sub-12 GB
ollama pull mannix/gemma4-98e-v7-coderx:Q6_K # max fidelity (bench tier)
ollama pull mannix/gemma4-98e-v7-coderx:IQ2_S # sub-8 GB
ollama pull mannix/gemma4-98e-v7-coderx:vision-Q4_K_M # + SigLIP vision tower
Inherits Gemma 4’s thinking format — serve with the reasoning parser enabled (--reasoning-format deepseek --reasoning-budget 8192 on llama-server).
Derivative of Gemma 4 — Gemma Terms of Use.