145 Downloads Updated yesterday
Updated 2 weeks ago
2 weeks ago
e8a2ac6e63a7 · 5.2GB ·
MAGNUM V4 / I-MATRIX / 9-27B / I-QUANT
A reliable storytelling model. Personal model of choice with static quants. This model has a lot of info in its dataset, so many references you make will be picked up even without lorebooks. To stuff as many parameters in as little VRAM as possible, weighted K and I-quants will be listed, along with multiple distillations as the original creator of the model has offered many. Note that I-quants forfeit some token generation speed relative to K-quants in exchange for storage efficiency.
For your specific VRAM requirements, the following is recommended:
For 4GB GPUs: IQ2_S. This collection’s limit is 6GB, but if you wish to use Magnum V4, the 2-bit I-quant is available. Do not expect general inference skills of any kind, this might as well be strict roleplay only.
For 6GB GPUs: 9b_IQ4_XS. It’ll work if it’s the only thing running. Video streaming may slow it down. If it does, try IQ3_S.
For 8GB GPUs: 12b_IQ4_XS. It works fast enough on 8GB GPUs without needed to drop to 9b models.
For 12GB GPUs: 12b_Q6_K. It’ll work fine, though if you want to experiment there are larger models listed that will fit in VRAM.
For 16GB GPUs: 27b_Q3_K_M or 27b_IQ3_S. These are recommended, but if your GPU struggles with this, the 22b_Q4_K_M works.
For >=20GB GPUs: Any model listed will fit fine in VRAM.
These models were taken from GGUF formats from Huggingface.
Original model (anthracite-org):
GGUF weighted quantizations (mradermacher):