55 Downloads Updated 1 week ago
ollama run JetBrains/mellum2-instruct-q4_k_m
ollama launch claude --model JetBrains/mellum2-instruct-q4_k_m
ollama launch codex-app --model JetBrains/mellum2-instruct-q4_k_m
ollama launch openclaw --model JetBrains/mellum2-instruct-q4_k_m
ollama launch hermes --model JetBrains/mellum2-instruct-q4_k_m
ollama launch codex --model JetBrains/mellum2-instruct-q4_k_m
ollama launch opencode --model JetBrains/mellum2-instruct-q4_k_m
This repository contains a GGUF Q4_K_M quantization of
JetBrains/Mellum2-12B-A2.5B-Instruct, ready to run with
llama.cpp, Ollama, LM Studio, and
other GGUF-compatible runtimes.
Mellum2 Instruct is a Mixture-of-Experts assistant model (64 experts, 8 activated per token, 131,072-token context) that answers directly, without an externalized chain of thought. For the full model description, evaluation results, and architecture details, see the original model card: JetBrains/Mellum2-12B-A2.5B-Instruct.
| Quantization | Description | Size | KLD vs BF16 ↓ | Top-token match ↑ |
|---|---|---|---|---|
Q4_K_M (this repo) |
4-bit k-quant, balanced (recommended) | 8.1 GB | 0.106 | 87.2% |
BF16 |
16-bit, no quantization (reference) | 24.3 GB | — | — |
Q8_0 |
8-bit, effectively lossless | 12.9 GB | 0.016 | 95.2% |
Q6_K |
6-bit k-quant, very high quality | 10.9 GB | 0.038 | 92.9% |
MXFP4_MOE |
MXFP4 4-bit on MoE experts, smallest | 7.0 GB | 0.166 | 84.2% |
KL divergence and top-token agreement are measured against the BF16 logits on
Wikitext-2 (n_ctx=512); lower KLD / higher agreement means closer to the
unquantized model.
ollama create JetBrains/mellum2-instruct-q4_k_m -f Modelfile
ollama run JetBrains/mellum2-instruct-q4_k_m
Released under the Apache 2.0 license.
For the full model card, evaluation results, and architecture details, refer to the original model: JetBrains/Mellum2-12B-A2.5B-Instruct.