A 72B parameter coding model optimized for software engineering tasks, based on the Qwen2.5-72B architecture.

Details

Updated 4 months ago

4 months ago

9470053c61d9 · 29GB ·

model

archqwen2

parameters72.7B

quantizationIQ2_M

29GB

template

<|im_start|>system {{ .System }}<|im_end|> {{ if .Messages }}{{ range .Messages }}{{ if eq .Role "us

278B

system

You are KAT-Dev, a highly capable AI coding assistant created by Kuaishou. You excel at software eng

252B

params

{ "num_ctx": 8192, "stop": [ "<|im_start|>", "<|im_end|>", "<|endoft

141B

KAT-Dev-72B - Advanced Coding Assistant

A 72B parameter coding model optimized for software engineering tasks, based on the Qwen2.5-72B architecture.

Overview

KAT-Dev-72B-Exp is a state-of-the-art coding model created by Kuaishou that achieves 74.6% accuracy on SWE-Bench Verified, making it one of the most capable open-source coding models available. This model excels at:

Code generation and completion
Debugging and error analysis
Code refactoring and optimization
Multi-language programming support
Software engineering problem-solving

Model Variants

Four quantized versions are available, offering different trade-offs between quality and resource requirements:

Variant	Size	Bits per Weight	Best For
iq4_xs	39 GB	4.25 bpw	Maximum quality, minimal degradation
iq3_m	35 GB	3.66 bpw	High quality, good balance
iq2_m	29 GB	2.7 bpw	Balanced compression
iq2_xxs	25 GB	2.06 bpw	Maximum compression, minimal memory

Quick Start

Pull the model

# Choose your preferred quantization
ollama pull richardyoung/kat-dev-72b:iq4_xs   # Best quality
ollama pull richardyoung/kat-dev-72b:iq3_m    # Recommended
ollama pull richardyoung/kat-dev-72b:iq2_m    # Lower memory
ollama pull richardyoung/kat-dev-72b:iq2_xxs  # Minimum memory

Run the model

ollama run richardyoung/kat-dev-72b:iq3_m

Example Usage

Code Generation:

ollama run richardyoung/kat-dev-72b:iq3_m "Write a Python function to implement binary search"

Debugging:

ollama run richardyoung/kat-dev-72b:iq3_m "Debug this code: [paste your code]"

Code Review:

ollama run richardyoung/kat-dev-72b:iq3_m "Review and suggest improvements for: [code]"

Model Configuration

All variants use optimized parameters for coding tasks: - Temperature: 0.6 (balanced creativity and precision) - Top-p: 0.9 (nucleus sampling) - Top-k: 40 - Context length: 8192 tokens - Chat template: Qwen-style (<|im_start|> / <|im_end|>)

Performance

SWE-Bench Verified: 74.6% accuracy
Architecture: Qwen2.5-72B base
Training: Optimized for software engineering tasks
Languages: Strong support for Python, JavaScript, TypeScript, Java, C++, Go, Rust, and more

Memory Requirements

Approximate VRAM/RAM needed for inference:

Variant	Minimum VRAM	Recommended VRAM
iq4_xs	40 GB	48 GB
iq3_m	35 GB	40 GB
iq2_m	30 GB	35 GB
iq2_xxs	26 GB	30 GB

Use Cases

Software Development

Generate boilerplate code and templates
Implement algorithms and data structures
Create unit tests and test cases
Write documentation and comments

Code Analysis

Debug and fix errors
Optimize performance bottlenecks
Refactor legacy code
Explain complex code sections

Learning & Education

Understand programming concepts
Learn new languages and frameworks
Practice coding problems
Get detailed explanations of code

Technical Details

Quantization: IQ (Importance-Quantized) methods from llama.cpp - Preserves important weights with higher precision - Optimizes less critical weights for size reduction - Maintains model quality while reducing memory footprint

Original Model: Kwaipilot/KAT-Dev-72B-Exp

Quantizations: mradermacher on HuggingFace

Choosing the Right Variant

iq4_xs - Choose if you: - Have 40GB+ VRAM available - Need maximum quality for production use - Are working on critical or complex projects

iq3_m - Choose if you: - Have 35-40GB VRAM available - Want the best quality-to-size ratio - Need reliable performance for most tasks (Recommended)

iq2_m - Choose if you: - Have 30-35GB VRAM available - Can tolerate slight quality reduction - Need to fit the model in limited memory

iq2_xxs - Choose if you: - Have 26-30GB VRAM available - Prioritize memory efficiency - Need quick prototyping or testing

API Usage

import requests
import json

def query_kat_dev(prompt, model="richardyoung/kat-dev-72b:iq3_m"):
    response = requests.post('http://localhost:11434/api/generate',
                           json={
                               "model": model,
                               "prompt": prompt,
                               "stream": False
                           })
    return response.json()['response']

# Example
code = query_kat_dev("Write a function to reverse a linked list in Python")
print(code)

License

This model inherits the license from the original KAT-Dev-72B-Exp model. Please refer to the original model page for licensing details.

Citation

If you use this model in your research or applications, please cite:

@misc{kat-dev-72b-2025,
  author = {Kuaishou Technology},
  title = {KAT-Dev-72B: A High-Performance Coding Model},
  year = {2025},
  url = {https://huggingface.co/Kwaipilot/KAT-Dev-72B-Exp}
}

Acknowledgments

Original Model: Kuaishou Technology (Kwaipilot)
Quantizations: mradermacher
Framework: Ollama, llama.cpp
Base Architecture: Qwen2.5-72B by Alibaba Cloud

Support & Issues

For issues or questions: - Ollama models: https://ollama.com/richardyoung/kat-dev-72b - Original model: https://huggingface.co/Kwaipilot/KAT-Dev-72B-Exp

Note: This is an unofficial distribution. The model is quantized from the original KAT-Dev-72B-Exp for easier deployment via Ollama.

A 72B parameter coding model optimized for software engineering tasks, based on the Qwen2.5-72B architecture.

Details

Readme

KAT-Dev-72B - Advanced Coding Assistant

Overview

Model Variants

Quick Start

Pull the model

Run the model

Example Usage

Model Configuration

Performance

Memory Requirements

Use Cases

Software Development

Code Analysis

Learning & Education

Technical Details

Choosing the Right Variant

API Usage

License

Citation

Acknowledgments

Support & Issues