Gemma 4 26B-A4B 98e v7-coderx — code-maximal prune

20.8B params · 98 experts (30 dropped) · ~4B active · code-maximal drop map

A research checkpoint that takes Gemma-4-26B-A4B-it and drops ³⁰⁄₁₂₈ experts per layer using a code-maximal recipe on the rebuilt v7 competence maps (audited producers, 10 classes) — generic-code 4× + LiveCodeBench-medium 3× (generate_drop_map_v5fk, no per-layer floor clamp), with no science or multilingual targeting, plus an agentic loop-protection force-keep (46 experts) so the served model does not loop. Same router, attention, and norms as base, plus the mandatory shared-FFN α=1.2 upweight every coder variant carries.

The cohort’s code specialist: on the all-hard LiveCodeBench-77 — the most demanding LCB slice — it scores 85.71%, the highest in the cohort (128e 79.22, v7-coder 84.42), and it leads on HumanEval+ 93.29, MATH-500 95.0 and AIME 76.67. On the easier LCB-medium-v4 slices it sits a little below the generalists (LCB-55 92.73 vs 128e 96.36). This is the loop-fixed build — it force-keeps the agentic loop-protection experts and replaces the earlier looping fs2440 prune. The trade is graduate science (GPQA 51.01). Its published sibling v7-coder is the other STD16-family coder — it leads LCB-medium (LCB-55 98.18) and HumanEval (98.17), while v7-coderx leads the all-hard LCB-77 and HE+; both sit near GPQA 51 (neither is a science model).

Full model card & methodology: ManniX-ITA/gemma-4-A4B-98e-v7-coderx-it on Hugging Face.

Other formats: - GGUF (13 tiers, imatrix K-quants + CD-Q2_K mix + F16 + imatrix.dat + mmproj): ManniX-ITA/gemma-4-A4B-98e-v7-coderx-it-GGUF - NVFP4A16 (native vLLM, ~13 GB): ManniX-ITA/gemma-4-A4B-98e-v7-coderx-NVFP4A16

Scores (Q6_K, llama.cpp, greedy, same host)

LCB-55	LCB-100	MultiPL-E	HE	HE+	IFEval	GSM8K	MATH-500	AIME	ARC	GPQA-D
92.73	91.00	89.00	96.95	93.29	92.00	93.00	95.00	76.67	86.60	51.01

Reference columns on the same Q6_K run: unpruned 128e LCB-55 96.36 / LCB-100 97.00 / hard-77 79.22; v6-coder LCB-55 92.73 / LCB-100 94.00. v7-coderx leads the cohort on the all-hard LCB-77 and on HE+/MATH/AIME; the budget is paid on graduate science (GPQA 51.01, vs 128e 67.17) and the easier instruction/ARC axes.

LiveCodeBench across problem sets

v7-coderx’s code score depends on the LCB slice — all cells are the same greedy Q6_K llama.cpp run. The all-hard 77q set is the most demanding and the most discriminating across the cohort, and it is where v7-coderx leads.

LCB problem set	128e	v7-coder	v7-coderx
LCB-medium-55 (v4, 55q)	96.36%	96.36%	92.73%
LCB-medium-100 (v4, 100q)	97.00%	97.00%	91.00%
LCB-v6-55 (55q) †	—	92.73%	98.18%
LCB-hard-77 (all-hard, 77q)	79.22%	84.42%	85.71%

† LCB-v6-55 is a small, noisier 55-problem slice (no greedy 128e baseline was run); all-hard 77q is the reference for cross-model comparison.

Quantizations — HE+ / MultiPL-E-100 score, size & answer length

Every K-quant and CD tier was scored on HumanEval+ (164) and MultiPL-E-100 (llama.cpp, greedy T=0), with per-problem completion length from token_stats. bpw is the true bits-per-weight (8 × bytes ÷ 19,877,953,946). ⭐ marks a recommended pick.

Tier	Size (GB)	bpw	HE+ %	HE+ tok p50/p90/max	MPE-100 %	MPE tok p50/p90/max
Q8_0	21.16	8.52	92.07	230/431/1002	88.33	85/188/1012
Q6_K_L	17.98	7.24	92.68	229/465/1748	89.67	84/179/603
⭐ Q6_K	17.81	7.17	92.07	238/442/3897	90.33	84/178/975
Q5_K_L	15.25	6.14	92.07	230/483/3374	89.00	84/193/1011
Q5_K_M	15.07	6.07	90.24	228/445/1715	89.67	85/184/981
Q4_K_L	13.42	5.40	92.07	251/457/1838	89.33	86/187/982
Q4_K_M	13.24	5.33	92.07	247/448/1407	88.33	85/182/1006
⭐ Q4_K_S	12.21	4.91	93.29	245/445/2187	89.00	84/190/965
⭐ IQ4_NL	11.42	4.60	91.46	229/447/1409	91.33	84/180/902
IQ4_XS	11.01	4.43	90.85	231/426/1321	89.67	84/180/687
Q3_K_L	10.94	4.40	92.68	240/425/8163	88.67	86/207/1003
Q3_K_M	10.51	4.23	92.07	239/422/2496	89.33	86/204/875
⭐ CD-Q2_K	8.82	3.55	89.02	229/509/3334	87.00	92/210/1010

Recommended picks:

Q6_K ⭐ (17.81 GB) — max-fidelity tier — 92.07% HE+ / 90.33% MPE-100.
Q4_K_S ⭐ (12.21 GB) — best K-quant — 93.29% HE+, the highest of any tier.
IQ4_NL ⭐ (11.42 GB) — best compact — 91.33% MPE-100 (highest of any tier) + 91.46% HE+.
CD-Q2_K ⭐ (8.82 GB) — smallest tier — 89.02% HE+ at the lowest disk.

The K-quant and CD tiers hold HE+ in the 90–93% band with length essentially identical to Q6_K; the 2-bit Q2_K_L / IQ2_XS tiers are the cliff (HE+ into the 80s/70s, token p90 blows out). The K-quant CD tiers are the recommended low-bit path — CD-IQ* i-quant bodies are not offered (the pruned MoE degenerates on an IQ-family body, score → 0). Prefer Q4_K_M or higher for production.

Head-to-head by file size — v7-coderx vs Qwen2.5-Coder-14B (iso-disk)

Pairing by tier name is misleading — this is a ~20.8B-total MoE and Qwen2.5-Coder-14B is a 14.7B dense model, so the same tier name lands at a different size. The fair comparison is iso-disk: at a given GB budget, which scores higher on HumanEval+? Same rig (RTX 3090, llama.cpp, greedy). Qwen GGUFs are bartowski’s (83–85% across the ladder). At every band the MoE runs lower bpw at the same disk and still scores higher.

Disk band	Qwen2.5-Coder-14B (size / bpw / HE+)	v7-coderx best (size / bpw / HE+)	Δ HE+
~21.2 GB	(none — Qwen ceiling Q8_0 15.70 GB)	Q8_0 21.16 / 8.52 / 92.07%	new top
~17.8 GB	(none — Qwen ceiling Q8_0 15.70 GB)	Q6_K 17.81 / 7.17 / 92.07%	new top
~15.1 GB	Q8_0 15.70 / 8.54 / 84.76%	Q5_K_M 15.07 / 6.07 / 90.24%	+5.48
~12.2 GB	Q6_K 12.12 / 6.60 / 84.76%	Q4_K_S 12.21 / 4.91 / 93.29% — ⭐ best K-quant 93.29%	+8.53
~13.2 GB	Q6_K 12.12 / 6.60 / 84.76%	Q4_K_M 13.24 / 5.33 / 92.07%	+7.31
~11.4 GB	Q5_K_M 10.51 / 5.72 / 83.54%	IQ4_NL 11.42 / 4.60 / 91.46% — ⭐ best MPE 91.33%	+7.92
~11.0 GB	Q5_K_M 10.51 / 5.72 / 83.54%	IQ4_XS 11.01 / 4.43 / 90.85%	+7.31
~10.5 GB	Q5_K_M 10.51 / 5.72 / 83.54%	Q3_K_M 10.51 / 4.23 / 92.07% — iso-disk (~10.5 GB)	+8.53
~8.8 GB	Q4_K_M 8.99 / 4.89 / 85.37%	CD-Q2_K 8.82 / 3.55 / 89.02% — ⭐ smallest	+3.65

Pull

ollama pull mannix/gemma4-98e-v7-coderx                 # :latest = Q4_K_M
ollama pull mannix/gemma4-98e-v7-coderx:Q4_K_S          # ⭐ best K-quant — 93.29% HE+, 12.2 GB
ollama pull mannix/gemma4-98e-v7-coderx:IQ4_NL          # ⭐ best MPE-100 91.33% + 91.46% HE+, 11.4 GB
ollama pull mannix/gemma4-98e-v7-coderx:Q6_K            # max fidelity — 92.07% HE+ / 90.33% MPE
ollama pull mannix/gemma4-98e-v7-coderx:CD-Q2_K         # smallest — 89.02% HE+, 8.8 GB
ollama pull mannix/gemma4-98e-v7-coderx:vision-Q4_K_M   # + SigLIP vision tower

Inherits Gemma 4’s thinking format — serve with the reasoning parser enabled (--reasoning-format deepseek --reasoning-budget 8192 on llama-server).

Derivative of Gemma 4 — Gemma Terms of Use.

Gemma-4 98e coder max variant, top notch coding skills at the expense of science knowledge

Details