69 Downloads Updated 6 hours ago
ollama run mannix/gemma4-98e-v5-coder:Q2_K_L
Updated yesterday
yesterday
c0969deb3529 · 8.6GB ·
20.8B parameters · 98 experts (30 dropped) · code-axis drop map
Research checkpoint that takes Gemma-4-26B-A4B-it and drops 30 of 128 experts per layer using a code-targeted recipe (C6 layer-relevance-weighted v4-floor, breadth=50). Same router, same attention, same norms as base — only the expert keep-set changes. Compared to v4, this one protects code/math experts more tightly per layer.
Full model card, methodology, ablations, contamination audit: ManniX-ITA/gemma-4-A4B-98e-v5-coder-it on Hugging Face.
| Format | Repo | Notes |
|---|---|---|
| GGUF (this repo, llama.cpp / ollama) | ManniX-ITA/gemma-4-A4B-98e-v5-coder-it-GGUF |
Bartowski tier sweep (Q2_K → Q8_0, IQ-series) + 5 ContribDynamic CD-* per-layer quants. F16 baseline included. |
| NVFP4A16 (vLLM) | ManniX-ITA/gemma-4-A4B-98e-v5-coder-NVFP4A16 |
~13 GB, native vLLM, produced via modelopt==0.43.0. |
| BF16 source weights | ManniX-ITA/gemma-4-A4B-98e-v5-coder-it |
20.8B bf16; base for any further surgery / quant. |
Pick v5-coder for: Python / JS / Rust code generation, HumanEval / LCB workloads, MATH-500-class problems. Wins on every code bench and on MATH-500.
Pick v4 for: general-purpose chat when you don't specifically need the code lean. Reasoning / GK / instruction-following benches are flat or slightly behind on v5-coder.
# recommended default for most setups (≈14 GB VRAM)
ollama pull mannix/gemma4-98e-v5-coder:Q4_K_M
# best quality at moderate size (≈17 GB VRAM)
ollama pull mannix/gemma4-98e-v5-coder:Q6_K
# size-conscious (≈8 GB VRAM) — minimal quality loss on code
ollama pull mannix/gemma4-98e-v5-coder:CD-Q3_K_M
CD-* variants are ContribDynamic per-layer mixed quants — expert layers get more bits, attention/norm less. Roughly 5–10% faster than the matching plain quant at similar quality on code tasks.
NVFP4A16, vLLM, greedy decoding, thinking-token budget 12 288. Apples-to-apples against v4. Full per-task settings, output-length distributions, and HumanEval contamination smell test on the HF card.
| Benchmark (n) | 128e ref | 98e v4 | 98e v5-coder | Δ (v5c − v4) |
|---|---|---|---|---|
| HumanEval-164 chat (pass@1) | 96.95 | 96.95 | 98.17 | +1.22 |
| HumanEval+-164 chat (pass@1) | 92.07 | 91.46 | 92.68 | +1.22 |
| LCB-medium-55 v4 (pass@1) | 87.27 | 78.18 | 85.45 | +7.27 |
| MATH-500-100 (math_verify) | 89.00 | 89.00 | 92.00 | +3.00 |
| IFEval-100 (prompt_strict) | 95.00 | 93.00 | 94.00 | +1.00 |
| AIME 2024 (30) | 36.67 | 36.67 | 36.67 | 0.00 |
| GSM8K-100 (flex) | 91.00 | 86.00 | 86.00 | 0.00 |
| GPQA Diamond (198, flex) | 73.23 | 69.19 | 68.69 | −0.50 |
Reading the deltas: code wins are clean (HE +1.22, HE+ +1.22, LCB-medium +7.27, MATH-500 +3.00). Reasoning / GK / instruction-following stay flat. The +7.27 on LCB is well outside the ±2pp single-run noise floor on a 55-problem bench — that's the recipe's design intent.
Full comparison + caveats (especially the LCB apples-to-oranges note) on the HF card.
| Model | Params | HumanEval | HumanEval+ | MATH-500 | GPQA-D | IFEval |
|---|---|---|---|---|---|---|
| 98e v5-coder (this) | 20.8B / 4B MoE | 98.17 | 92.68 | 92.00 | 68.69 | 94.00 |
| Phi-4 | 14B dense | 82.6 | — | 80.4 | 56.1 | 63.0 |
| Qwen2.5-14B-Instruct | 14.7B dense | 81.7–86.2 | — | 73.0 | 40.9 | 80.0 |
| Qwen2.5-Coder-14B-Instruct | 14.7B dense | 89.6 | 87.2 | — | — | — |
| Codestral-22B v1 | 22B dense | 81.1 | — | — | — | — |
| Mistral-Small-3 | 24B dense | ~84 | — | 70.6 | 45.3 | 82.1 |
LCB across these models is not apples-to-apples — different problem subsets / time windows. See HF card for the breakdown.
Uses the Gemma 4 chat template with tool-use support and a 2nd-turn workaround for nested function calls. Default parameters baked into every tag:
PARAMETER temperature 0.6
PARAMETER top_p 0.95
PARAMETER repeat_penalty 1.15
PARAMETER repeat_last_n 256
PARAMETER num_ctx 256000
PARAMETER stop <turn|>
PARAMETER stop <|tool_response>
Gemma Terms of Use. Use of this model implies acceptance.
mannix/gemma4-98e-v4mannix/gemma4-98e