VenomBlood/ yicoder-q4

123 Downloads Updated 3 months ago

Yi-Coder 9B Chat quantized to Q4_K_M using llama.cpp, reducing size from ~18 GB to ~5.3 GB. Runs on 8 GB RAM. Optimized for Python test generation with pytest. Produces clean, ready-to-run code. Apache 2.0 licensed.

ollama run VenomBlood/yicoder-q4

curl http://localhost:11434/api/chat \
  -d '{
    "model": "VenomBlood/yicoder-q4",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

from ollama import chat

response = chat(
    model='VenomBlood/yicoder-q4',
    messages=[{'role': 'user', 'content': 'Hello!'}],
)
print(response.message.content)

import ollama from 'ollama'

const response = await ollama.chat({
  model: 'VenomBlood/yicoder-q4',
  messages: [{role: 'user', content: 'Hello!'}],
})
console.log(response.message.content)

Models

Name

1 model

Size / Usage

Context

Input

yicoder-q4:latest

5.3GB · 128K context window · Text · 3 months ago

yicoder-q4:latest

5.3GB

128K

Text

Readme

Yi-Coder 9B Chat — Q4_K_M

A quantized version of 01-ai/Yi-Coder-9B-Chat, converted to GGUF Q4_K_M format using llama.cpp. Optimized for deterministic code generation and test writing on consumer hardware.

Model Details

Property	Value
Base Model	01-ai/Yi-Coder-9B-Chat
Format	GGUF Q4_K_M
Quantization	Q4_K_M (via llama.cpp)
File Size	~5.3 GB
RAM Required	~8 GB
Context Window	8192 tokens (supports 128k)
License	Apache 2.0

Usage

ollama run yourusername/yicoder-q4

Parameters

Parameter	Value	Reason
temperature	0.1	Deterministic code output
top_p	0.9	Focused sampling
repeat_penalty	1.1	Reduces repetition in code
num_ctx	8192	Safe context size for code tasks

System Prompt

This model is configured as an expert Python software engineer specializing in writing comprehensive pytest test suites. It returns only valid Python code with no markdown fences or explanations.

Example

>>> Write a pytest test for this function:
def add(a: int, b: int) -> int:
    return a + b

def test_add():
    assert add(1, 2) == 3
    assert add(-1, 1) == 0
    assert add(-1, -2) == -3

Quantization Process

Quantized on Kaggle (Tesla T4 x2) using the following pipeline:

Downloaded Yi-Coder-9B-Chat weights from HuggingFace
Converted to Q8_0 GGUF (~9.5 GB) as intermediate using llama.cpp
Quantized Q8_0 → Q4_K_M (~5.3 GB) final output
Smoke-tested with llama-cli before publishing

Q8_0 was used as intermediate (instead of F16 at 18 GB) to fit within Kaggle’s free-tier disk limits.

Use with Modelfile

# Download GGUF + Modelfile
huggingface-cli download AkshajSeerpu/yi-coder-9b-q4km-gguf \
  yi-coder-9b-q4_k_m.gguf Modelfile --local-dir ./

# Import into Ollama
ollama create yicoder-q4 -f Modelfile

# Run
ollama run yicoder-q4

# Yi-Coder 9B Chat — Q4_K_M

A quantized version of [01-ai/Yi-Coder-9B-Chat](https://huggingface.co/01-ai/Yi-Coder-9B-Chat),
converted to GGUF Q4_K_M format using llama.cpp. Optimized for deterministic code generation
and test writing on consumer hardware.

## Model Details

| Property       | Value                        |
|----------------|------------------------------|
| Base Model     | 01-ai/Yi-Coder-9B-Chat       |
| Format         | GGUF Q4_K_M                  |
| Quantization   | Q4_K_M (via llama.cpp)       |
| File Size      | ~5.3 GB                      |
| RAM Required   | ~8 GB                        |
| Context Window | 8192 tokens (supports 128k)  |
| License        | Apache 2.0                   |

## Usage

```bash
ollama run yourusername/yicoder-q4
```

### Parameters

| Parameter      | Value | Reason                           |
|----------------|-------|----------------------------------|
| temperature    | 0.1   | Deterministic code output        |
| top_p          | 0.9   | Focused sampling                 |
| repeat_penalty | 1.1   | Reduces repetition in code       |
| num_ctx        | 8192  | Safe context size for code tasks |

## System Prompt

This model is configured as an expert Python software engineer specializing in
writing comprehensive pytest test suites. It returns only valid Python code with
no markdown fences or explanations.

## Example

```
>>> Write a pytest test for this function:
def add(a: int, b: int) -> int:
    return a + b

def test_add():
    assert add(1, 2) == 3
    assert add(-1, 1) == 0
    assert add(-1, -2) == -3
```

## Quantization Process

Quantized on Kaggle (Tesla T4 x2) using the following pipeline:

1. Downloaded Yi-Coder-9B-Chat weights from HuggingFace
2. Converted to Q8_0 GGUF (~9.5 GB) as intermediate using `llama.cpp`
3. Quantized Q8_0 → Q4_K_M (~5.3 GB) final output
4. Smoke-tested with `llama-cli` before publishing

> Q8_0 was used as intermediate (instead of F16 at 18 GB) to fit within
> Kaggle's free-tier disk limits.

## Use with Modelfile

```bash
# Download GGUF + Modelfile
huggingface-cli download AkshajSeerpu/yi-coder-9b-q4km-gguf \
  yi-coder-9b-q4_k_m.gguf Modelfile --local-dir ./

# Import into Ollama
ollama create yicoder-q4 -f Modelfile

# Run
ollama run yicoder-q4
```

Paste, drop or click to upload images (.png, .jpeg, .jpg, .svg, .gif)