Gemma 4 26B-A4B 98e v7-coder — agentic-coding specialist (STD16)

~20.8B params · 98 experts (30 dropped) · ~4B active · fk16 code/LCB-protective drop + agentic_eog loop force-keep + shared-FFN α=1.2 (no DERN)

A research checkpoint that prunes Gemma-4-26B-A4B-it from 128→98 experts/layer, then upweights the shared FFN. The selection is a code- and LiveCodeBench-protective competence-map drop that additionally force-keeps the 46 experts a loop-protection signal flags as load-bearing for clean multi-turn termination (agentic_eog — the same loop-protection set the sibling v7-coderx carries) — which is why the multi-turn agentic repetition loop is gone at the weights, not papered over by the sampler. Same router, attention and norms as base.

An agentic coder, fixed at the weights, that traded chemistry for stability. v7-coder holds a top-tier code/agentic profile — HumanEval 98.17%, HumanEval+ 92.07%, LiveCodeBench-medium-55 98.18%, IFEval 92%, AIME 80.0%, MATH500 95% (Q6_K, greedy) — while the agentic repetition loop that affected the earlier v7-coder is eliminated at the weights (0 loops across 48 seeds on every published tier). The price is graduate science: GPQA-diamond drops to 51.52%, and that entire loss is chemistry (−34pp) and biology (−31pp) — physics is preserved (+3.5pp). An agentic-coding workhorse, not a science model.

Full model card & methodology: ManniX-ITA/gemma-4-A4B-98e-v7-coder-it on Hugging Face.

Other formats: - GGUF (13 loop-clean tiers, imatrix, CD-Q2_K + F16 + mmproj): ManniX-ITA/gemma-4-A4B-98e-v7-coder-it-GGUF - NVFP4A16 (native vLLM, ~13 GB): ManniX-ITA/gemma-4-A4B-98e-v7-coder-NVFP4A16

Scores (Q6_K, llama.cpp, greedy, same host)

GPQA-D	AIME	MATH-500	GSM8K	ARC	IFEval	HE	HE+	LCB-55	LCB-100	MultiPL-E
51.52	80.00	95.00	91.00	92.15	92.00	98.17	92.07	98.18	94.00	89.67

Reference columns on the same Q6_K run: unpruned 128e GPQA 67.17 / AIME 73.33 / HE 97.56 / LCB-55 96.36; v6-coder GPQA 61.11 / HE 98.17 / LCB-55 92.73. The code/math axes are at or above the unpruned 128e (v7-coder beats it on AIME +6.7, MATH500 +3, GSM8K +2, LCB-55 +1.8). The whole trade is in graduate science.

The science trade — GPQA-diamond by domain (same greedy Q6_K run):

Domain	n	128e	v7-coder	Δ
Physics	86	82.56	86.05	+3.49
Chemistry	93	59.14	24.73	−34.41
Biology	19	57.89	26.32	−31.58
Overall	198	69.19	51.52	−17.68

The −18pp GPQA gap is not broad science loss — it is chemistry and biology, with physics untouched (organic chemistry, the bulk of GPQA chemistry, drops ~27%→~10%). Treat v7-coder as an agentic-coding model that keeps physics-style reasoning, not a chemistry/graduate-science assistant.

LiveCodeBench across problem sets

v7-coder’s code score depends on the LCB slice — all cells are the same greedy Q6_K llama.cpp run on the published builds. On the all-hard 77q set — the most demanding and the most discriminating across the cohort — v7-coder scores 84.42%, above the unpruned 128e (79.22%) and just under the code-maximal sibling v7-coderx (85.71%).

LCB problem set	128e	v7-coder	v7-coderx
LCB-medium-55 (v4, 55q)	96.36%	98.18%	92.73%
LCB-medium-100 (v4, 100q)	97.00%	94.00%	91.00%
LCB-hard-77 (all-hard, 77q)	79.22%	84.42%	85.71%

Quantizations — HE+ / MultiPL-E-100, size & bpw (deployment sampler)

The 13 loop-clean, published tiers were scored on HumanEval+ (164) and MultiPL-E-100 under the deployment sampler (vendor_minp_rep — per-tier temp 0.⁹⁄₀.8, top_p 0.95, top_k 64, min_p 0.05, repeat_penalty 1.1 — the anti-loop serving config every tag ships with), llama.cpp b9700. Size is the on-disk GGUF in GB (decimal, 10⁹ bytes); bpw = 8 × bytes ÷ 19,877,953,946. ⭐ marks a recommended pick. The low-bit Q3_K_S/Q3_K_XL/Q2_K_L, the QAT qat-Q4_0/CD-qat-Q4_K_M, and the i-quant CD-IQ2_NL tiers fail the 48-seed agentic loop gate and are not published.

Tier	Size (GB)	bpw	HE+ %	MPE-100 %
Q8_0	21.16	8.52	91.46	91.00
Q6_K_L	17.98	7.24	91.46	89.67
⭐ Q6_K	17.81	7.17	90.85	89.00
Q5_K_L	15.25	6.14	91.46	90.00
Q5_K_M	15.07	6.07	91.46	90.33
Q4_K_L	13.42	5.40	92.07	89.33
⭐ Q4_K_M	13.24	5.33	92.68	89.00
Q4_K_S	12.21	4.91	92.68	89.33
⭐ IQ4_NL	11.42	4.60	89.63	88.67
IQ4_XS	11.01	4.43	91.46	88.33
Q3_K_L	10.94	4.40	90.85	89.00
Q3_K_M	10.51	4.23	91.46	87.00
⭐ CD-Q2_K	8.82	3.55	89.63	86.00

Recommended picks (all 0 loops on the 48-seed agentic gate):

Q6_K ⭐ (17.81 GB) — max quality — 90.85% HE+ / 89.00% MPE.
Q4_K_M ⭐ (13.24 GB) — best size/quality default (:latest) — 92.68% HE+ / 89.00% MPE.
IQ4_NL ⭐ (11.42 GB) — compact 4-bit, fits a 12 GB GPU — 89.63% HE+ / 88.67% MPE.
CD-Q2_K ⭐ (8.82 GB) — smallest loop-clean tier (a 2-bit K-quant ContribDynamic body) — 89.63% HE+ / 86.00% MPE.

Across the 13 tiers HE+ holds in the 89.6–92.7% band and MPE in the 86–91% band — no code-quality cliff down to CD-Q2_K (8.82 GB). Prefer Q4_K_M or higher for production.

Head-to-head by file size — v7-coder vs Qwen2.5-Coder-14B (iso-disk)

Pairing by tier name is misleading — v7-coder is a ~20.8B-total MoE (~A4B active) and Qwen2.5-Coder-14B is a 14.7B dense model, so the same tier name lands at a different file size. The fair comparison is iso-disk: at a given GB budget, which scores higher on HumanEval+? v7-coder cells are the published loop-clean tiers at their deployment-sampler HE+; Qwen GGUFs are bartowski’s at greedy T=0 (83–85% across the ladder). At every band the MoE runs lower bpw at the same disk and still scores higher.

Disk band	Qwen2.5-Coder-14B (size / bpw / HE+)	v7-coder best (size / bpw / HE+)	Δ HE+
~17.8 GB	(none — Qwen ceiling Q8_0 15.70 GB)	Q6_K 17.81 / 7.17 / 90.85%	new top
~15.1 GB	Q8_0 15.70 / 8.54 / 84.76%	Q5_K_M 15.07 / 6.07 / 91.46%	+6.70
~13.2 GB	Q6_K 12.12 / 6.60 / 84.76%	Q4_K_M 13.24 / 5.33 / 92.68%	+7.92
~12.2 GB	Q6_K 12.12 / 6.60 / 84.76%	Q4_K_S 12.21 / 4.91 / 92.68%	+7.92
~11.4 GB	Q5_K_M 10.51 / 5.72 / 83.54%	IQ4_NL 11.42 / 4.60 / 89.63%	+6.09
~10.9 GB	Q5_K_M 10.51 / 5.72 / 83.54%	Q3_K_L 10.94 / 4.40 / 90.85%	+7.31
~10.5 GB	Q5_K_M 10.51 / 5.72 / 83.54%	Q3_K_M 10.51 / 4.23 / 91.46% — iso-disk	+7.92
~8.8 GB	Q4_K_M 8.99 / 4.89 / 85.37%	CD-Q2_K 8.82 / 3.55 / 89.63% — ⭐ smallest 89%+	+4.26

Qwen HE+ is greedy; v7-coder HE+ is the deployment sampler (the config this model ships with). At ~10.5 GB, v7-coder Q3_K_M (4.23 bpw / 91.46%) beats Qwen Q5_K_M (5.72 bpw / 83.54%) by +7.92pp at the exact same file size and −1.49 bpw.

Pull

ollama pull mannix/gemma4-98e-v7-coder                 # :latest = Q4_K_M (92.68% HE+, 13.2 GB)
ollama pull mannix/gemma4-98e-v7-coder:Q6_K            # max quality — 90.85% HE+, 17.8 GB
ollama pull mannix/gemma4-98e-v7-coder:IQ4_NL          # compact 4-bit, fits 12 GB GPU — 89.63% HE+
ollama pull mannix/gemma4-98e-v7-coder:CD-Q2_K         # smallest loop-clean — 89.63% HE+, 8.8 GB
ollama pull mannix/gemma4-98e-v7-coder:vision-Q4_K_M   # + SigLIP vision tower

Every tag ships the deployment (anti-loop) sampler baked in — top_p 0.95, top_k 64, min_p 0.05, repeat_penalty 1.1, num_ctx 32768, and the per-tier temperature (0.8 for Q8_0/Q4_K_L/Q3_K_L, 0.9 for the rest, from the 48-seed loop gate). ollama 0.30.x has no reasoning-budget flag, so num_ctx 32768 is what bounds the thinking phase there — keep it well above the budget. For llama.cpp serving add --reasoning-format deepseek --reasoning-budget 8192 -c 32768.

Derivative of Gemma 4 — Gemma Terms of Use.

An even more improved version of Gemma-4 98e coder variant, the best 20b coder

Details