100 1 month ago

Yi-Coder 9B Chat quantized to Q4_K_M using llama.cpp, reducing size from ~18 GB to ~5.3 GB. Runs on 8 GB RAM. Optimized for Python test generation with pytest. Produces clean, ready-to-run code. Apache 2.0 licensed.

ollama run VenomBlood/yicoder-q4

Details

1 month ago

1692161374a1 · 5.3GB ·

llama
·
8.83B
·
Q4_K_M
{{- range .Messages }}<|im_start|>{{ .Role }} {{ .Content }}<|im_end|> {{ end }}<|im_start|>assistan
{ "num_ctx": 8192, "repeat_penalty": 1.1, "stop": [ "<|im_start|>", "<|i

Readme

Yi-Coder 9B Chat — Q4_K_M

A quantized version of 01-ai/Yi-Coder-9B-Chat, converted to GGUF Q4_K_M format using llama.cpp. Optimized for deterministic code generation and test writing on consumer hardware.

Model Details

Property Value
Base Model 01-ai/Yi-Coder-9B-Chat
Format GGUF Q4_K_M
Quantization Q4_K_M (via llama.cpp)
File Size ~5.3 GB
RAM Required ~8 GB
Context Window 8192 tokens (supports 128k)
License Apache 2.0

Usage

ollama run yourusername/yicoder-q4

Parameters

Parameter Value Reason
temperature 0.1 Deterministic code output
top_p 0.9 Focused sampling
repeat_penalty 1.1 Reduces repetition in code
num_ctx 8192 Safe context size for code tasks

System Prompt

This model is configured as an expert Python software engineer specializing in writing comprehensive pytest test suites. It returns only valid Python code with no markdown fences or explanations.

Example

>>> Write a pytest test for this function:
def add(a: int, b: int) -> int:
    return a + b

def test_add():
    assert add(1, 2) == 3
    assert add(-1, 1) == 0
    assert add(-1, -2) == -3

Quantization Process

Quantized on Kaggle (Tesla T4 x2) using the following pipeline:

  1. Downloaded Yi-Coder-9B-Chat weights from HuggingFace
  2. Converted to Q8_0 GGUF (~9.5 GB) as intermediate using llama.cpp
  3. Quantized Q8_0 → Q4_K_M (~5.3 GB) final output
  4. Smoke-tested with llama-cli before publishing

Q8_0 was used as intermediate (instead of F16 at 18 GB) to fit within Kaggle’s free-tier disk limits.

Use with Modelfile

# Download GGUF + Modelfile
huggingface-cli download AkshajSeerpu/yi-coder-9b-q4km-gguf \
  yi-coder-9b-q4_k_m.gguf Modelfile --local-dir ./

# Import into Ollama
ollama create yicoder-q4 -f Modelfile

# Run
ollama run yicoder-q4