24 1 week ago

Low latency instruct LLM by JetBrains

ollama run JetBrains/mellum2-base-q4_k_m

Models

View all →

Readme

Mellum2 Base — Q4_K_M

This repository contains a GGUF Q4_K_M quantization of JetBrains/Mellum2-12B-A2.5B-Base, ready to run with llama.cpp, Ollama, LM Studio, and other GGUF-compatible runtimes.

Mellum2 Base is the pretrained Mixture-of-Experts foundation model (64 experts, 8 activated per token, 131,072-token context) behind the Mellum2 family. It is a raw causal language model intended for fill-in-the-middle (FIM) code completion and as a starting point for fine-tuning — it is not instruction-tuned and has no chat template. For the full model description and architecture details, see the original model card: JetBrains/Mellum2-12B-A2.5B-Base.

Available quantizations

Quantization Description Size
Q4_K_M (this repo) 4-bit k-quant, balanced (recommended) 8.1 GB

Fill-in-the-middle (FIM) format

This is a base completion model: the client supplies the fully formatted prompt and the model continues from <fim_middle>. Mellum uses a suffix-prefix-middle ordering and supports optional repository context via <filename> tags:

<filename>path/to/file.py
<fim_suffix>{code after the cursor}<fim_prefix>{code before the cursor}<fim_middle>

The model generates the code that belongs at the cursor and emits <|endoftext|> when done.

Run with Ollama

ollama create JetBrains/mellum2-base-q4_k_m -f Modelfile
ollama run JetBrains/mellum2-base-q4_k_m

Because this is a raw completion model, send a prompt that already contains the FIM control tokens, for example via the API with raw mode:

curl http://localhost:11434/api/generate -d '{
  "model": "JetBrains/mellum2-base-q4_k_m",
  "raw": true,
  "stream": false,
  "prompt": "<filename>fib.py\n<fim_suffix>\n\nprint(fib(10))<fim_prefix>def fib(n):\n<fim_middle>"
}'

License

Released under the Apache 2.0 license.


For the full model card and architecture details, refer to the original model: JetBrains/Mellum2-12B-A2.5B-Base.