145 yesterday

Suite of weighted quantizations for Magnum V4 models. Available in 9b, 12b, 22b, and 27b. Made by Anthracite-org (Huggingface).

tools

2 weeks ago

e1b7fcc59456 · 15GB ·

gemma2
·
27.2B
·
IQ4_XS
{{- if .Suffix }}<|fim_prefix|>{{ .Prompt }}<|fim_suffix|>{{ .Suffix }}<|fim_middle|> {{- else if .M
Write {{char}}'s next reply in this fictional roleplay with {{user}}.
{ "stop": [ "<|im_start|>", "<|im_end|>" ] }

Readme

MAGNUM V4 / I-MATRIX / 9-27B / I-QUANT

A reliable storytelling model. Personal model of choice with static quants. This model has a lot of info in its dataset, so many references you make will be picked up even without lorebooks. To stuff as many parameters in as little VRAM as possible, weighted K and I-quants will be listed, along with multiple distillations as the original creator of the model has offered many. Note that I-quants forfeit some token generation speed relative to K-quants in exchange for storage efficiency.

For your specific VRAM requirements, the following is recommended:

For 4GB GPUs: IQ2_S. This collection’s limit is 6GB, but if you wish to use Magnum V4, the 2-bit I-quant is available. Do not expect general inference skills of any kind, this might as well be strict roleplay only.

For 6GB GPUs: 9b_IQ4_XS. It’ll work if it’s the only thing running. Video streaming may slow it down. If it does, try IQ3_S.

For 8GB GPUs: 12b_IQ4_XS. It works fast enough on 8GB GPUs without needed to drop to 9b models.

For 12GB GPUs: 12b_Q6_K. It’ll work fine, though if you want to experiment there are larger models listed that will fit in VRAM.

For 16GB GPUs: 27b_Q3_K_M or 27b_IQ3_S. These are recommended, but if your GPU struggles with this, the 22b_Q4_K_M works.

For >=20GB GPUs: Any model listed will fit fine in VRAM.

These models were taken from GGUF formats from Huggingface.

Original model (anthracite-org):

GGUF weighted quantizations (mradermacher):

OBLIGATORY_PICTURE_MAGNUM.png