Details

Updated 2 days ago

2 days ago

742c547015f8 · 10GB ·

model

archgemma4

parameters19.9B

quantizationQ5_K_M

10GB

template

<start_of_turn>{{- if or (eq .Role "system") (eq .Role "user") }}user {{- if (eq .Role "system") }}

585B

Gemma 4 26B-A4B 98e v7-coder — science-augmented code prune

20.8B params · 98 experts (30 dropped) · ~4B active · code map + targeted_gpqa

A research checkpoint that takes Gemma-4-26B-A4B-it and drops ³⁰⁄₁₂₈ experts per layer using a code-targeted recipe on the rebuilt v7 competence maps (audited producers, 10 classes, multilingual category), on a [24,40] per-layer floor — plus a targeted_gpqa class at weight 1.5 that protects a science-specialist keep-set derived from GPQA-diamond pass-traces. Same router, attention, and norms as base, plus the mandatory shared-FFN α=1.2 upweight every coder variant carries.

A coder that kept all its science: it holds v6-coder’s top-tier code profile while pulling GPQA-diamond to 70.71% — +9.6pp over v6-coder and at parity with the unpruned 128e (67.17 on the same Q6_K run). For maximal coding throughput with science left at baseline, see the sibling v7-coderx.

Full model card & methodology: ManniX-ITA/gemma-4-A4B-98e-v7-coder-it on Hugging Face.

Other formats: - GGUF (29 tiers, imatrix, CD-* per-layer mixes + F16 + mmproj): ManniX-ITA/gemma-4-A4B-98e-v7-coder-it-GGUF - NVFP4A16 (native vLLM, ~13 GB): ManniX-ITA/gemma-4-A4B-98e-v7-coder-NVFP4A16

Scores (Q6_K, llama.cpp, greedy, same host)

GPQA-D	AIME	MATH-500	GSM8K	ARC	IFEval	HE	HE+	LCB-55	LCB-100	MultiPL-E
70.71	76.67	92.00	93.00	94.80	95.00	98.78	92.68	96.36	97.00	88.67

Reference columns on the same Q6_K run: unpruned 128e GPQA 67.17 / AIME 73.33 / HE 97.56 / LCB-55 96.36; v6-coder GPQA 61.11 / HE 98.17 / LCB-55 92.73. v7-coder matches or beats 128e on GPQA, AIME and GSM8K while holding the cohort code profile. (Small benches — GPQA 198q, AIME 30q — carry run-to-run variance; read GPQA/AIME as “recovered the science gap”, not a robust win over the base.)

GGUF tiers (size guide)

Per-tier HE+ was not swept for the v7 cohort — the Q6_K table above is the reference; quality below ~Q3 / 3-bit degrades on the Gemma 4 MoE, so prefer Q4_K_M or higher for production. Full 29-tier list (incl. CD-* per-layer mixes) is on the GGUF repo.

Tier	Size (GB)	bpw	Role
Q8_0	21.16	8.14	near-lossless
Q6_K	17.81	6.84	max fidelity (bench tier)
Q5_K_M	15.07	5.79	high quality
Q4_K_M	13.24	5.09	:latest — recommended default
Q4_K_S	12.21	4.69	compact 4-bit
⭐ IQ4_XS	11.01	4.23	safe sub-12 GB 4-bit
Q3_K_M	10.51	4.04	smallest comfortable
IQ3_XS	9.22	3.54	sub-10 GB
IQ2_M	8.22	3.16	sub-8.5 GB (degraded)
IQ2_S	7.83	3.01	smallest viable

Pull

ollama pull mannix/gemma4-98e-v7-coder            # :latest = Q4_K_M
ollama pull mannix/gemma4-98e-v7-coder:IQ4_XS     # safe 4-bit, sub-12 GB
ollama pull mannix/gemma4-98e-v7-coder:Q6_K       # max fidelity (bench tier)
ollama pull mannix/gemma4-98e-v7-coder:IQ2_S      # sub-8 GB
ollama pull mannix/gemma4-98e-v7-coder:vision-Q4_K_M   # + SigLIP vision tower

Inherits Gemma 4’s thinking format — serve with the reasoning parser enabled (--reasoning-format deepseek --reasoning-budget 8192 on llama-server).

Derivative of Gemma 4 — Gemma Terms of Use.

An even more improved version of Gemma-4 98e coder variant, the best 20b coder

Details

Readme

Gemma 4 26B-A4B 98e v7-coder — science-augmented code prune

Scores (Q6_K, llama.cpp, greedy, same host)

GGUF tiers (size guide)

Pull