8 Downloads Updated 2 days ago
Name
33 models
ZEPHYRIA-Mistral_Small:37b_IQ1_M
8.8GB · 128K context window · Text · 4 days ago
ZEPHYRIA-Mistral_Small:37b_IQ2_XXS
10GB · 128K context window · Text · 4 days ago
ZEPHYRIA-Mistral_Small:37b_IQ3_XXS
14GB · 128K context window · Text · 4 days ago
ZEPHYRIA-Mistral_Small:37b_Q3_K_M
18GB · 128K context window · Text · 4 days ago
ZEPHYRIA-Mistral_Small:37b_Q4_K_M
22GB · 128K context window · Text · 3 days ago
ZEPHYRIA-Mistral_Small:37b_Q5_K_M
26GB · 128K context window · Text · 3 days ago
ZEPHYRIA-Mistral_Small:37b_Q6_K
31GB · 128K context window · Text · 3 days ago
ZEPHYRIA-Mistral_Small:39b_IQ1_M
9.1GB · 128K context window · Text · 4 days ago
ZEPHYRIA-Mistral_Small:39b_IQ2_XXS
10GB · 128K context window · Text · 4 days ago
ZEPHYRIA-Mistral_Small:39b_IQ3_XXS
15GB · 128K context window · Text · 4 days ago
ZEPHYRIA-Mistral_Small:39b_Q2_K
14GB · 128K context window · Text · 4 days ago
ZEPHYRIA-Mistral_Small:39b_Q3_K_M
19GB · 128K context window · Text · 2 days ago
ZEPHYRIA-Mistral_Small:39b_Q4_K_S
22GB · 128K context window · Text · 2 days ago
ZEPHYRIA-Mistral_Small:42b_IQ1_M
9.9GB · 128K context window · Text · 4 days ago
ZEPHYRIA-Mistral_Small:42b_IQ2_XXS
11GB · 128K context window · Text · 4 days ago
ZEPHYRIA-Mistral_Small:42b_IQ4_XS
23GB · 128K context window · Text · 4 days ago
ZEPHYRIA-Mistral_Small:42b_Q3_K_S
18GB · 128K context window · Text · 2 days ago
ZEPHYRIA-Mistral_Small:42b_Q5_K_M
30GB · 128K context window · Text · 4 days ago
ZEPHYRIA-Mistral_Small:42b_Q6_K
35GB · 128K context window · Text · 2 days ago
ZEPHYRIA-Mistral_Small:45b_IQ1_M
10GB · 128K context window · Text · 4 days ago
ZEPHYRIA-Mistral_Small:45b_IQ2_S
14GB · 128K context window · Text · 2 days ago
ZEPHYRIA-Mistral_Small:45b_IQ4_XS
24GB · 128K context window · Text · 2 days ago
ZEPHYRIA-Mistral_Small:45b_Q3_K_M
21GB · 128K context window · Text · 4 days ago
ZEPHYRIA-Mistral_Small:45b_Q3_K_S
19GB · 128K context window · Text · 2 days ago
ZEPHYRIA-Mistral_Small:45b_Q5_K_S
31GB · 128K context window · Text · 4 days ago
ZEPHYRIA-Mistral_Small:45b_Q6_K
36GB · 128K context window · Text · 2 days ago
ZEPHYRIA-Mistral_Small:48b_IQ1_M
11GB · 128K context window · Text · 4 days ago
ZEPHYRIA-Mistral_Small:48b_IQ2_XS
14GB · 128K context window · Text · 4 days ago
ZEPHYRIA-Mistral_Small:48b_IQ3_XXS
19GB · 128K context window · Text · 4 days ago
ZEPHYRIA-Mistral_Small:48b_Q3_K_S
21GB · 128K context window · Text · 4 days ago
ZEPHYRIA-Mistral_Small:48b_Q4_K_M
29GB · 128K context window · Text · 4 days ago
ZEPHYRIA-Mistral_Small:48b_Q5_K_M
34GB · 128K context window · Text · 2 days ago
ZEPHYRIA-Mistral_Small:48b_Q6_K
40GB · 128K context window · Text · 2 days ago
ZEPHYRIA / I-MATRIX / 37-48B / I-QUANT
This model has been recommended often in forums and even interpersonally. This model has multiple versions finetuned from Mistral Small Instruct to complete different tasks. The versions listed are as follows:
Early Duplication: The 48-billion parameter and largest version which has had around two thirds of the layers duplicated, thus has been said to benefit “Low level feature processing.”
Balanced with Extended Duplication: The 45-billion parameter model which more heavily relies on duplicated layers earlier in the finetuning process. More fitting for complex generative tasks. May be the go-to for storytelling.
Mid Duplication: The 42-billion parameter model with a 1:1 ratio between unique and duplicated layers. Given as a general purpose model.
Balanced: The 39-billion parameter model with an approximate 1:1:1 ratio between duplicated, non duplicated, and unique layers after the process. On low VRAM devices, this would be the recommended general purpose model to use.
Late Duplication: The 37-billion parameter model with a 4:3 ratio between non-duplicated and duplicated layers. “Ideal for tasks requiring extensive unique feature processing,” the card says.
To stuff as many parameters in as little VRAM as possible, weighted K and I-quants will be listed, along with each version the creator featured. Also, since this is an “experimental” model, the medium 1-bit I-quants will be included just for fun. They may prove effective, as larger models that have been heavily quantized will still have enough paramters as to not generate garbled nonsense (see Ubergarm’s 671b 1-bit Deepseek), whether that rings true for 37-48b, that is to be seen. Note that I-quants forfeit some token generation speed relative to K-quants in exchange for storage efficiency.
As this is a large model collection, VRAM requirements can increase drastically when looking for performant quantizations. The following recommendations are tailored for consumer setups:
For >=48 (2x24, 4x12, etc.) GB GPUs: Anything. Anything will be fine.
For 40 (2x20, etc.) GB GPUs: Any non 8-bit model should work. If you wish to work with the largest, early duplication model at 6-bit, you may have to manually set layer splits to prevent any CPU offloading. To err on the side of caution, use the 5-bit models if you wish to use the largest model.
For 32 (2x16, 1X32, etc.) GB GPUs: 5-bit quantizations below 48-billion parameters will fit in 32GB VRAM. Note that the smallest model can be used at 6-bit.
For 24GB (2x12, 1X24, etc.) GB GPUs: The small 3-bit K-quant 48b, the medium 3-bit at 45b, the extra-small 4-bit I-quant at 42b, the small 4-bit K-quant at 39b, or the medium 4-bit K-quant at 37b.
For 20 GB GPUs: The small 3-bit K-quant for the 42b model, or the medium K-quant for the 39 and 37b model will fit into 20GB. If you would prefer a lower perplexity rather than a faster generation, use the small 3-bit I-quants. You can also use the XXS 3-bit I-quant if you want to use the 48b model.
For 16GB GPUs: The XXS 3-bit I-quant for the 37b and 39b model; and the extra-small 2-bit I-quant for the 42b, 45b, and 48b model all fit in 16GB.
For 12GB GPUs: You can use the extra-small, 2-bit I-quant for the 37b model will work fine, along with the XXS 2-bit I-quants for 42b and below.
For 10GB GPUs: Take a look at the 1-bit I-quants to see if they produce any worthwhile results.
These models were taken from GGUF formats from Huggingface.
Original model (SteelStorage):
GGUF weighted quantizations (mradermacher):