6 Downloads Updated 2 days ago
Name
8 models
TRINITY_MINI-26b:MXFP4
15GB · 128K context window · Text · 2 days ago
TRINITY_MINI-26b:Q3_K_XL
13GB · 128K context window · Text · 2 days ago
TRINITY_MINI-26b:Q6_K_L
22GB · 128K context window · Text · 2 days ago
TRINITY_MINI-26b:Q3_K_S
12GB · 128K context window · Text · 2 days ago
TRINITY_MINI-26b:Q5_K_M
19GB · 128K context window · Text · 2 days ago
TRINITY_MINI-26b:Q8_0
28GB · 128K context window · Text · 2 days ago
TRINITY_MINI-26b:IQ3_XS
11GB · 128K context window · Text · 2 days ago
TRINITY_MINI-26b:IQ4_XS
14GB · 128K context window · Text · 2 days ago
TRINITY MINI / 26B (8X3B) / I-QUANT
This model was tested to be very performant for its 3-billion active parameter size. This model is freely available to use and, like gpt-oss, offers an MXFP4 format for high resource efficiency, may main reason for adding this model to Ollama. To stuff as many parameters in as little VRAM as possible, I-quants will also be listed.
Note that I-quants forfeit some token generation speed relative to K-quants in exchange for storage efficiency. This model has not been tested on Ollama as of yet - the 3-bit K-quant should fit into VRAM in 16GB GPUs. If you wish to experiment with the MXFP4 model on 16GB VRAM and it does not work on Ollama, you can manually defer all layers to the GPU on LM Studio, see the links below for the Huggingface card. These models were taken from GGUF formats from Huggingface.
These models were taken from GGUF formats from Huggingface.
GGUF standard quantizations (bartowski):
GGUF MXFP4 quantization (noctrex):