145 yesterday

Suite of weighted quantizations for Magnum V4 models. Available in 9b, 12b, 22b, and 27b. Made by Anthracite-org (Huggingface).

tools

3 weeks ago

b4552852d9ff · 7.1GB ·

llama
·
12.2B
·
Q4_K_S
{{- range $index, $_ := .Messages }} {{- if eq .Role "system" }}[SYSTEM_PROMPT]{{ .Content }}[/SYSTE
Write {{char}}'s next reply in this fictional roleplay with {{user}}.

Readme

MAGNUM V4 / I-MATRIX / 9-27B / I-QUANT

A reliable storytelling model. Personal model of choice with static quants. This model has a lot of info in its dataset, so many references you make will be picked up even without lorebooks. To stuff as many parameters in as little VRAM as possible, weighted K and I-quants will be listed, along with multiple distillations as the original creator of the model has offered many. Note that I-quants forfeit some token generation speed relative to K-quants in exchange for storage efficiency.

For your specific VRAM requirements, the following is recommended:

For 4GB GPUs: IQ2_S. This collection’s limit is 6GB, but if you wish to use Magnum V4, the 2-bit I-quant is available. Do not expect general inference skills of any kind, this might as well be strict roleplay only.

For 6GB GPUs: 9b_IQ4_XS. It’ll work if it’s the only thing running. Video streaming may slow it down. If it does, try IQ3_S.

For 8GB GPUs: 12b_IQ4_XS. It works fast enough on 8GB GPUs without needed to drop to 9b models.

For 12GB GPUs: 12b_Q6_K. It’ll work fine, though if you want to experiment there are larger models listed that will fit in VRAM.

For 16GB GPUs: 27b_Q3_K_M or 27b_IQ3_S. These are recommended, but if your GPU struggles with this, the 22b_Q4_K_M works.

For >=20GB GPUs: Any model listed will fit fine in VRAM.

These models were taken from GGUF formats from Huggingface.

Original model (anthracite-org):

GGUF weighted quantizations (mradermacher):

OBLIGATORY_PICTURE_MAGNUM.png