LESSTHANSUPER/TRINITY_MINI-26b:Q5_K_M

LESSTHANSUPER/ TRINITY_MINI-26b:Q5_K_M

55 Downloads Updated 2 months ago

Mixture of Experts reasoning model. Made by Acree-ai (Huggingface).

tools

ollama run LESSTHANSUPER/TRINITY_MINI-26b:Q5_K_M

curl http://localhost:11434/api/chat \
  -d '{
    "model": "LESSTHANSUPER/TRINITY_MINI-26b:Q5_K_M",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

from ollama import chat

response = chat(
    model='LESSTHANSUPER/TRINITY_MINI-26b:Q5_K_M',
    messages=[{'role': 'user', 'content': 'Hello!'}],
)
print(response.message.content)

import ollama from 'ollama'

const response = await ollama.chat({
  model: 'LESSTHANSUPER/TRINITY_MINI-26b:Q5_K_M',
  messages: [{role: 'user', content: 'Hello!'}],
})
console.log(response.message.content)

Details

Updated 2 months ago

2 months ago

b767f399e345 · 19GB ·

model

archafmoe

·

parameters26.1B

·

quantizationQ5_K_M

19GB

template

{{- if .Suffix }}<|fim_prefix|>{{ .Prompt }}<|fim_suffix|>{{ .Suffix }}<|fim_middle|> {{- else if .M

1.6kB

system

You are an assistant.

21B

params

{ "repeat_penalty": 1, "stop": [ "<|im_start|>", "<|im_end|>" ], "te

120B

Readme

TRINITY MINI / 26B (8X3B) / I-QUANT

This model was tested to be very performant for its 3-billion active parameter size. This model is freely available to use and, like gpt-oss, offers an MXFP4 format for high resource efficiency, may main reason for adding this model to Ollama. To stuff as many parameters in as little VRAM as possible, I-quants will also be listed.

Note that I-quants forfeit some token generation speed relative to K-quants in exchange for storage efficiency. This model has not been tested on Ollama as of yet - the 3-bit K-quant should fit into VRAM in 16GB GPUs. If you wish to experiment with the MXFP4 model on 16GB VRAM and it does not work on Ollama, you can manually defer all layers to the GPU on LM Studio, see the links below for the Huggingface card. These models were taken from GGUF formats from Huggingface.

These models were taken from GGUF formats from Huggingface.

Original model (acree-ai):

GGUF standard quantizations (bartowski):

GGUF MXFP4 quantization (noctrex):

**TRINITY MINI / 26B (8X3B) / I-QUANT**

This model was tested to be very performant for its 3-billion active parameter size. [_This model is freely available to use_](chat.acree.ai) and, like gpt-oss, offers an MXFP4 format for high resource efficiency, may main reason for adding this model to Ollama. To stuff as many parameters in as little VRAM as possible, I-quants will also be listed.

Note that I-quants forfeit some token generation speed relative to K-quants in exchange for storage efficiency. This model has not been tested on Ollama as of yet - the 3-bit K-quant should fit into VRAM in 16GB GPUs. If you wish to experiment with the MXFP4 model on 16GB VRAM and it does not work on Ollama, you can manually defer all layers to the GPU on LM Studio, see the links below for the Huggingface card. These models were taken from GGUF formats from Huggingface.

These models were taken from GGUF formats from Huggingface.

[*Original model (acree-ai):*](https://huggingface.co/arcee-ai/Trinity-Mini)

[*GGUF standard quantizations (bartowski):*](https://huggingface.co/bartowski/arcee-ai_Trinity-Mini-GGUF)

[*GGUF MXFP4 quantization (noctrex):*](https://huggingface.co/noctrex/Trinity-Mini-MXFP4_MOE-GGUF)

![OBLIGATORY_PICTURE_TRINITY.png](/assets/LESSTHANSUPER/TRINITY_MINI-26b/79f3f749-4d0f-4332-8261-dc33ffe8ce5d)

Paste, drop or click to upload images (.png, .jpeg, .jpg, .svg, .gif)