JetBrains/ mellum2-base-q4_k_m

24 Downloads Updated 1 week ago

Low latency instruct LLM by JetBrains

ollama run JetBrains/mellum2-base-q4_k_m

curl http://localhost:11434/api/chat \
  -d '{
    "model": "JetBrains/mellum2-base-q4_k_m",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

from ollama import chat

response = chat(
    model='JetBrains/mellum2-base-q4_k_m',
    messages=[{'role': 'user', 'content': 'Hello!'}],
)
print(response.message.content)

import ollama from 'ollama'

const response = await ollama.chat({
  model: 'JetBrains/mellum2-base-q4_k_m',
  messages: [{role: 'user', content: 'Hello!'}],
})
console.log(response.message.content)

Models

Name

1 model

Size / Usage

Context

Input

mellum2-base-q4_k_m:latest

8.1GB · 128K context window · Text · 1 week ago

mellum2-base-q4_k_m:latest

8.1GB

128K

Text

Readme

Mellum2 Base — Q4_K_M

This repository contains a GGUF Q4_K_M quantization of JetBrains/Mellum2-12B-A2.5B-Base, ready to run with llama.cpp, Ollama, LM Studio, and other GGUF-compatible runtimes.

Mellum2 Base is the pretrained Mixture-of-Experts foundation model (64 experts, 8 activated per token, 131,072-token context) behind the Mellum2 family. It is a raw causal language model intended for fill-in-the-middle (FIM) code completion and as a starting point for fine-tuning — it is not instruction-tuned and has no chat template. For the full model description and architecture details, see the original model card: JetBrains/Mellum2-12B-A2.5B-Base.

Available quantizations

Quantization	Description	Size
`Q4_K_M` (this repo)	4-bit k-quant, balanced (recommended)	8.1 GB

Fill-in-the-middle (FIM) format

This is a base completion model: the client supplies the fully formatted prompt and the model continues from <fim_middle>. Mellum uses a suffix-prefix-middle ordering and supports optional repository context via <filename> tags:

<filename>path/to/file.py
<fim_suffix>{code after the cursor}<fim_prefix>{code before the cursor}<fim_middle>

The model generates the code that belongs at the cursor and emits <|endoftext|> when done.

Run with Ollama

ollama create JetBrains/mellum2-base-q4_k_m -f Modelfile
ollama run JetBrains/mellum2-base-q4_k_m

Because this is a raw completion model, send a prompt that already contains the FIM control tokens, for example via the API with raw mode:

curl http://localhost:11434/api/generate -d '{
  "model": "JetBrains/mellum2-base-q4_k_m",
  "raw": true,
  "stream": false,
  "prompt": "<filename>fib.py\n<fim_suffix>\n\nprint(fib(10))<fim_prefix>def fib(n):\n<fim_middle>"
}'

License

Released under the Apache 2.0 license.

For the full model card and architecture details, refer to the original model: JetBrains/Mellum2-12B-A2.5B-Base.

# Mellum2 Base — Q4_K_M

This repository contains a **GGUF Q4_K_M** quantization of
[`JetBrains/Mellum2-12B-A2.5B-Base`](https://huggingface.co/JetBrains/Mellum2-12B-A2.5B-Base), ready to run with
[`llama.cpp`](https://github.com/ggml-org/llama.cpp), Ollama, LM Studio, and
other GGUF-compatible runtimes.

Mellum2 Base is the pretrained Mixture-of-Experts foundation model (64 experts, 8
activated per token, 131,072-token context) behind the Mellum2 family. It is a
raw causal language model intended for **fill-in-the-middle (FIM) code
completion** and as a starting point for fine-tuning — it is *not*
instruction-tuned and has no chat template. For the full model description and
architecture details, see the original model card:
**[JetBrains/Mellum2-12B-A2.5B-Base](https://huggingface.co/JetBrains/Mellum2-12B-A2.5B-Base)**.

## Available quantizations

| Quantization | Description | Size |
|---|---|---|
| **`Q4_K_M` (this repo)** | 4-bit k-quant, balanced (recommended) | 8.1 GB |

## Fill-in-the-middle (FIM) format

This is a base completion model: the client supplies the fully formatted prompt
and the model continues from `<fim_middle>`. Mellum uses a suffix-prefix-middle
ordering and supports optional repository context via `<filename>` tags:

```
<filename>path/to/file.py
<fim_suffix>{code after the cursor}<fim_prefix>{code before the cursor}<fim_middle>
```

The model generates the code that belongs at the cursor and emits `<|endoftext|>`
when done.

## Run with Ollama

```sh
ollama create JetBrains/mellum2-base-q4_k_m -f Modelfile
ollama run JetBrains/mellum2-base-q4_k_m
```

Because this is a raw completion model, send a prompt that already contains the
FIM control tokens, for example via the API with `raw` mode:

```sh
curl http://localhost:11434/api/generate -d '{
  "model": "JetBrains/mellum2-base-q4_k_m",
  "raw": true,
  "stream": false,
  "prompt": "<filename>fib.py\n<fim_suffix>\n\nprint(fib(10))<fim_prefix>def fib(n):\n<fim_middle>"
}'
```

## License

Released under the Apache 2.0 license.

---

*For the full model card and architecture details, refer to the original model:
[JetBrains/Mellum2-12B-A2.5B-Base](https://huggingface.co/JetBrains/Mellum2-12B-A2.5B-Base).*

Paste, drop or click to upload images (.png, .jpeg, .jpg, .svg, .gif)