1,606 1 week ago

2nd gen OmniCoder, fine-tuned from Qwen3.5-9B. Trains on assistant tokens only (unlike v1): no more repetition loops, stable tool-calling in long agentic sessions. Original: https://hf.co/Tesslate/OmniCoder-2-9B

tools thinking
ollama run carstenuhlig/omnicoder-2-9b

Applications

Claude Code
Claude Code ollama launch claude --model carstenuhlig/omnicoder-2-9b
Codex
Codex ollama launch codex --model carstenuhlig/omnicoder-2-9b
OpenCode
OpenCode ollama launch opencode --model carstenuhlig/omnicoder-2-9b
OpenClaw
OpenClaw ollama launch openclaw --model carstenuhlig/omnicoder-2-9b

Models

View all →

Readme

OmniCoder 2 9B

Fine-tune of Qwen3.5-9B on 425K agentic coding trajectories: terminal agent runs, SWE-bench patches, tool-use sequences. Built for IDE coding agents (OpenCode, Cline, Roo Code) and terminal pipelines, not general chat.

v2 trains on assistant tokens only. v1 saw all tokens including template boilerplate, which caused repetition loops and unstable tool-calling in long sessions. v2 also preserves think blocks on every turn, so the model reasons throughout a multi-step session rather than just at the final answer.

Original model: Tesslate/OmniCoder-2-9B

Benchmarks (self-reported)

Benchmark OmniCoder 2 9B Base Qwen3.5-9B
Terminal-Bench 2.0 25.8% 14.6%
GPQA Diamond pass@1 83% 81.7%
AIME 2025 pass@5 90 91.6

Parameters and quantization

Temperature 0.6 (0.2-0.4 for tool-heavy agentic use), top-p 0.95, top-k 20, context 32768.

Quant Size Notes
Q4_K_M (this) 5.7 GB 8 GB VRAM, recommended
Q5_K_M 6.5 GB Better tool-call reliability
Q8_0 9.5 GB Near-lossless
BF16 17.9 GB Best for production pipelines

I know i have not yet added those 3 other quantizations

At Q4_K_M and below, tool-call failures increase in long agentic loops. Lower temperature or step up to Q5_K_M helps.