mannix/gemma4-98e-v4:Q8

Details

Updated 3 days ago

3 days ago

59ec01dfea3c · 21GB ·

model

archgemma4

parameters19.9B

quantizationQ8_0

21GB

template

{{- if or .System .Tools }}<bos><|turn>system {{ if .System }}{{ .System }} {{ end }}{{- if .Tools }

1.3kB

params

{ "num_ctx": 256000, "repeat_last_n": 256, "repeat_penalty": 1.15, "stop": [

150B

The gemma-4-A4B-98e-v4 is pruned specifically to keep general knowledge as wide as possible, unlike the v3 aimed specifically at keeping intact reasoning. The token usage is similar as the original 128e version, lower than v3 which needs 1.7x.

  HumanEval-chat token usage (164 problems × max=3072)

  ┌──────────────┬─────┬─────┬─────┬─────┬──────┬─────┐
  │   variant    │ min │ p10 │ p50 │ p90 │ max  │ avg │
  ├──────────────┼─────┼─────┼─────┼─────┼──────┼─────┤
  │ 128e @3072   │  35 │ 125 │ 314 │ 589 │  917 │ 334 │
  ├──────────────┼─────┼─────┼─────┼─────┼──────┼─────┤
  │ 98e-v4       │  35 │ 114 │ 304 │ 648 │  895 │ 340 │
  ├──────────────┼─────┼─────┼─────┼─────┼──────┼─────┤
  │ 98e-v3 @3072 │  35 │ 206 │ 490 │ 897 │ 1013 │ 512 │
  └──────────────┴─────┴─────┴─────┴─────┴──────┴─────┘

Template fixed for tools usage

Model on HF:

https://huggingface.co/ManniX-ITA/gemma-4-A4B-98e-v4-it

Full GGUF:

https://huggingface.co/ManniX-ITA/gemma-4-A4B-98e-v4-it-GGUF

Pruned to 98 experts gemma-4 a4b 26b v4

Details

Readme