8 2 days ago

Series of 37b-48b experimental models. Made by SteelStorage (Huggingface).

tools

Models

View all →

33 models

ZEPHYRIA-Mistral_Small:37b_IQ1_M

8.8GB · 128K context window · Text · 4 days ago

ZEPHYRIA-Mistral_Small:37b_IQ2_XXS

10GB · 128K context window · Text · 4 days ago

ZEPHYRIA-Mistral_Small:37b_IQ3_XXS

14GB · 128K context window · Text · 4 days ago

ZEPHYRIA-Mistral_Small:37b_Q3_K_M

18GB · 128K context window · Text · 4 days ago

ZEPHYRIA-Mistral_Small:37b_Q4_K_M

22GB · 128K context window · Text · 3 days ago

ZEPHYRIA-Mistral_Small:37b_Q5_K_M

26GB · 128K context window · Text · 3 days ago

ZEPHYRIA-Mistral_Small:37b_Q6_K

31GB · 128K context window · Text · 3 days ago

ZEPHYRIA-Mistral_Small:39b_IQ1_M

9.1GB · 128K context window · Text · 4 days ago

ZEPHYRIA-Mistral_Small:39b_IQ2_XXS

10GB · 128K context window · Text · 4 days ago

ZEPHYRIA-Mistral_Small:39b_IQ3_XXS

15GB · 128K context window · Text · 4 days ago

ZEPHYRIA-Mistral_Small:39b_Q2_K

14GB · 128K context window · Text · 4 days ago

ZEPHYRIA-Mistral_Small:39b_Q3_K_M

19GB · 128K context window · Text · 2 days ago

ZEPHYRIA-Mistral_Small:39b_Q4_K_S

22GB · 128K context window · Text · 2 days ago

ZEPHYRIA-Mistral_Small:42b_IQ1_M

9.9GB · 128K context window · Text · 4 days ago

ZEPHYRIA-Mistral_Small:42b_IQ2_XXS

11GB · 128K context window · Text · 4 days ago

ZEPHYRIA-Mistral_Small:42b_IQ4_XS

23GB · 128K context window · Text · 4 days ago

ZEPHYRIA-Mistral_Small:42b_Q3_K_S

18GB · 128K context window · Text · 2 days ago

ZEPHYRIA-Mistral_Small:42b_Q5_K_M

30GB · 128K context window · Text · 4 days ago

ZEPHYRIA-Mistral_Small:42b_Q6_K

35GB · 128K context window · Text · 2 days ago

ZEPHYRIA-Mistral_Small:45b_IQ1_M

10GB · 128K context window · Text · 4 days ago

ZEPHYRIA-Mistral_Small:45b_IQ2_S

14GB · 128K context window · Text · 2 days ago

ZEPHYRIA-Mistral_Small:45b_IQ4_XS

24GB · 128K context window · Text · 2 days ago

ZEPHYRIA-Mistral_Small:45b_Q3_K_M

21GB · 128K context window · Text · 4 days ago

ZEPHYRIA-Mistral_Small:45b_Q3_K_S

19GB · 128K context window · Text · 2 days ago

ZEPHYRIA-Mistral_Small:45b_Q5_K_S

31GB · 128K context window · Text · 4 days ago

ZEPHYRIA-Mistral_Small:45b_Q6_K

36GB · 128K context window · Text · 2 days ago

ZEPHYRIA-Mistral_Small:48b_IQ1_M

11GB · 128K context window · Text · 4 days ago

ZEPHYRIA-Mistral_Small:48b_IQ2_XS

14GB · 128K context window · Text · 4 days ago

ZEPHYRIA-Mistral_Small:48b_IQ3_XXS

19GB · 128K context window · Text · 4 days ago

ZEPHYRIA-Mistral_Small:48b_Q3_K_S

21GB · 128K context window · Text · 4 days ago

ZEPHYRIA-Mistral_Small:48b_Q4_K_M

29GB · 128K context window · Text · 4 days ago

ZEPHYRIA-Mistral_Small:48b_Q5_K_M

34GB · 128K context window · Text · 2 days ago

ZEPHYRIA-Mistral_Small:48b_Q6_K

40GB · 128K context window · Text · 2 days ago

Readme

ZEPHYRIA / I-MATRIX / 37-48B / I-QUANT

This model has been recommended often in forums and even interpersonally. This model has multiple versions finetuned from Mistral Small Instruct to complete different tasks. The versions listed are as follows:

Early Duplication: The 48-billion parameter and largest version which has had around two thirds of the layers duplicated, thus has been said to benefit “Low level feature processing.”

Balanced with Extended Duplication: The 45-billion parameter model which more heavily relies on duplicated layers earlier in the finetuning process. More fitting for complex generative tasks. May be the go-to for storytelling.

Mid Duplication: The 42-billion parameter model with a 1:1 ratio between unique and duplicated layers. Given as a general purpose model.

Balanced: The 39-billion parameter model with an approximate 1:1:1 ratio between duplicated, non duplicated, and unique layers after the process. On low VRAM devices, this would be the recommended general purpose model to use.

Late Duplication: The 37-billion parameter model with a 4:3 ratio between non-duplicated and duplicated layers. “Ideal for tasks requiring extensive unique feature processing,” the card says.

To stuff as many parameters in as little VRAM as possible, weighted K and I-quants will be listed, along with each version the creator featured. Also, since this is an “experimental” model, the medium 1-bit I-quants will be included just for fun. They may prove effective, as larger models that have been heavily quantized will still have enough paramters as to not generate garbled nonsense (see Ubergarm’s 671b 1-bit Deepseek), whether that rings true for 37-48b, that is to be seen. Note that I-quants forfeit some token generation speed relative to K-quants in exchange for storage efficiency.

As this is a large model collection, VRAM requirements can increase drastically when looking for performant quantizations. The following recommendations are tailored for consumer setups:

For >=48 (2x24, 4x12, etc.) GB GPUs: Anything. Anything will be fine.

For 40 (2x20, etc.) GB GPUs: Any non 8-bit model should work. If you wish to work with the largest, early duplication model at 6-bit, you may have to manually set layer splits to prevent any CPU offloading. To err on the side of caution, use the 5-bit models if you wish to use the largest model.

For 32 (2x16, 1X32, etc.) GB GPUs: 5-bit quantizations below 48-billion parameters will fit in 32GB VRAM. Note that the smallest model can be used at 6-bit.

For 24GB (2x12, 1X24, etc.) GB GPUs: The small 3-bit K-quant 48b, the medium 3-bit at 45b, the extra-small 4-bit I-quant at 42b, the small 4-bit K-quant at 39b, or the medium 4-bit K-quant at 37b.

For 20 GB GPUs: The small 3-bit K-quant for the 42b model, or the medium K-quant for the 39 and 37b model will fit into 20GB. If you would prefer a lower perplexity rather than a faster generation, use the small 3-bit I-quants. You can also use the XXS 3-bit I-quant if you want to use the 48b model.

For 16GB GPUs: The XXS 3-bit I-quant for the 37b and 39b model; and the extra-small 2-bit I-quant for the 42b, 45b, and 48b model all fit in 16GB.

For 12GB GPUs: You can use the extra-small, 2-bit I-quant for the 37b model will work fine, along with the XXS 2-bit I-quants for 42b and below.

For 10GB GPUs: Take a look at the 1-bit I-quants to see if they produce any worthwhile results.

These models were taken from GGUF formats from Huggingface.

Original model (SteelStorage):

GGUF weighted quantizations (mradermacher):

OBLIGATORY_PICTURE_ZEPHYRIA.png