13 Downloads Updated 1 week ago
ollama run JetBrains/mellum2-thinking-bf16
ollama launch claude --model JetBrains/mellum2-thinking-bf16
ollama launch codex-app --model JetBrains/mellum2-thinking-bf16
ollama launch openclaw --model JetBrains/mellum2-thinking-bf16
ollama launch hermes --model JetBrains/mellum2-thinking-bf16
ollama launch codex --model JetBrains/mellum2-thinking-bf16
ollama launch opencode --model JetBrains/mellum2-thinking-bf16
This repository contains a GGUF BF16, no quantization of
JetBrains/Mellum2-12B-A2.5B-Thinking, ready to run with
llama.cpp, Ollama, LM Studio, and
other GGUF-compatible runtimes.
Mellum2 Thinking is a Mixture-of-Experts reasoning model (64 experts, 8
activated per token, 131,072-token context) that emits its chain of thought
inside <think>...</think> blocks before the final answer. For the full model
description, evaluation results, and architecture details, see the original
model card: JetBrains/Mellum2-12B-A2.5B-Thinking.
| Quantization | Description | Size | KLD vs BF16 ↓ | Top-token match ↑ |
|---|---|---|---|---|
BF16 (this repo) |
16-bit, no quantization (reference) | 24.3 GB | — | — |
Q8_0 |
8-bit, effectively lossless | 12.9 GB | 0.004 | 97.4% |
Q6_K |
6-bit k-quant, very high quality | 10.9 GB | 0.014 | 95.1% |
Q4_K_M |
4-bit k-quant, balanced (recommended) | 8.1 GB | 0.052 | 89.8% |
MXFP4_MOE |
MXFP4 4-bit on MoE experts, smallest | 7.0 GB | 0.088 | 87.3% |
KL divergence and top-token agreement are measured against the BF16 logits on
Wikitext-2 (n_ctx=512); lower KLD / higher agreement means closer to the
unquantized model.
ollama run hf.co/JetBrains/Mellum2-12B-A2.5B-Thinking-GGUF-BF16
Released under the Apache 2.0 license.
For the full model card, evaluation results, and architecture details, refer to the original model: JetBrains/Mellum2-12B-A2.5B-Thinking.