24 Downloads Updated 1 week ago
ollama run JetBrains/mellum2-base-q4_k_m
This repository contains a GGUF Q4_K_M quantization of
JetBrains/Mellum2-12B-A2.5B-Base, ready to run with
llama.cpp, Ollama, LM Studio, and
other GGUF-compatible runtimes.
Mellum2 Base is the pretrained Mixture-of-Experts foundation model (64 experts, 8 activated per token, 131,072-token context) behind the Mellum2 family. It is a raw causal language model intended for fill-in-the-middle (FIM) code completion and as a starting point for fine-tuning — it is not instruction-tuned and has no chat template. For the full model description and architecture details, see the original model card: JetBrains/Mellum2-12B-A2.5B-Base.
| Quantization | Description | Size |
|---|---|---|
Q4_K_M (this repo) |
4-bit k-quant, balanced (recommended) | 8.1 GB |
This is a base completion model: the client supplies the fully formatted prompt
and the model continues from <fim_middle>. Mellum uses a suffix-prefix-middle
ordering and supports optional repository context via <filename> tags:
<filename>path/to/file.py
<fim_suffix>{code after the cursor}<fim_prefix>{code before the cursor}<fim_middle>
The model generates the code that belongs at the cursor and emits <|endoftext|>
when done.
ollama create JetBrains/mellum2-base-q4_k_m -f Modelfile
ollama run JetBrains/mellum2-base-q4_k_m
Because this is a raw completion model, send a prompt that already contains the
FIM control tokens, for example via the API with raw mode:
curl http://localhost:11434/api/generate -d '{
"model": "JetBrains/mellum2-base-q4_k_m",
"raw": true,
"stream": false,
"prompt": "<filename>fib.py\n<fim_suffix>\n\nprint(fib(10))<fim_prefix>def fib(n):\n<fim_middle>"
}'
Released under the Apache 2.0 license.
For the full model card and architecture details, refer to the original model: JetBrains/Mellum2-12B-A2.5B-Base.