1,618 Downloads Updated 1 week ago
ollama run carstenuhlig/omnicoder-2-9b:q4_k_m
Updated 1 week ago
1 week ago
70edfef80929 · 5.7GB ·
Fine-tune of Qwen3.5-9B on 425K agentic coding trajectories: terminal agent runs, SWE-bench patches, tool-use sequences. Built for IDE coding agents (OpenCode, Cline, Roo Code) and terminal pipelines, not general chat.
v2 trains on assistant tokens only. v1 saw all tokens including template boilerplate, which caused repetition loops and unstable tool-calling in long sessions. v2 also preserves think blocks on every turn, so the model reasons throughout a multi-step session rather than just at the final answer.
Original model: Tesslate/OmniCoder-2-9B
| Benchmark | OmniCoder 2 9B | Base Qwen3.5-9B |
|---|---|---|
| Terminal-Bench 2.0 | 25.8% | 14.6% |
| GPQA Diamond pass@1 | 83% | 81.7% |
| AIME 2025 pass@5 | 90 | 91.6 |
Temperature 0.6 (0.2-0.4 for tool-heavy agentic use), top-p 0.95, top-k 20, context 32768.
| Quant | Size | Notes |
|---|---|---|
| Q4_K_M (this) | 5.7 GB | 8 GB VRAM, recommended |
| Q5_K_M | 6.5 GB | Better tool-call reliability |
| Q8_0 | 9.5 GB | Near-lossless |
| BF16 | 17.9 GB | Best for production pipelines |
I know i have not yet added those 3 other quantizations
At Q4_K_M and below, tool-call failures increase in long agentic loops. Lower temperature or step up to Q5_K_M helps.