6 Downloads Updated 2 days ago
Updated 2 days ago
2 days ago
b767f399e345 · 19GB ·
TRINITY MINI / 26B (8X3B) / I-QUANT
This model was tested to be very performant for its 3-billion active parameter size. This model is freely available to use and, like gpt-oss, offers an MXFP4 format for high resource efficiency, may main reason for adding this model to Ollama. To stuff as many parameters in as little VRAM as possible, I-quants will also be listed.
Note that I-quants forfeit some token generation speed relative to K-quants in exchange for storage efficiency. This model has not been tested on Ollama as of yet - the 3-bit K-quant should fit into VRAM in 16GB GPUs. If you wish to experiment with the MXFP4 model on 16GB VRAM and it does not work on Ollama, you can manually defer all layers to the GPU on LM Studio, see the links below for the Huggingface card. These models were taken from GGUF formats from Huggingface.
These models were taken from GGUF formats from Huggingface.
GGUF standard quantizations (bartowski):
GGUF MXFP4 quantization (noctrex):