47 2 hours ago

Local-first AI tool router + coder. 4 sizes. 100% routing accuracy. 22/22 coding eval. 97% free. Beats Opus.

4b 8b 14b 32b
ollama run dcostenco/prism-coder:1b7

Details

2 days ago

b1e2de329cab · 2.2GB ·

qwen3
·
2.03B
·
Q8_0
{ "num_ctx": 8192, "num_predict": 256, "stop": [ "<|im_end|>", "<|endoft

Readme

Prism Coder — Local-First AI Agent Tool Router

Fine-tuned on Qwen3 for AI agent tool routing. Part of a cascade architecture that handles 97% of queries locally — no cloud needed.

100% routing accuracy (BFCL eval). Beats Claude Opus-solo (97.1%) by 2.9 points.

Models

Model Size Use when
prism-coder:1b7 1.1 GB On-device, fastest (~60% of traffic)
prism-coder:8b 5.0 GB Lightweight local fallback
prism-coder:14b 9.0 GB Default — best balance (~37% of traffic)
prism-coder:32b 20 GB Edge cases the 14B fumbles (~2% of traffic)

Quick Start

”`bash

Recommended default

ollama run dcostenco/prism-coder:14b

Full cascade (on-device)

ollama run dcostenco/prism-coder:1b7 Cascade Architecture

1.7B on-device ✅ ~60% — free, instant 14B local ✅ ~37% — catches failures 32B local ✅ ~2% — edge cases Claude Opus 🌩️ ~1% — last resort 97% of traffic never touches the cloud.

Tools Routed session_load_context session_save_ledger session_save_handoff session_compact_ledger session_search_memory knowledge_search plain text (no tool needed) Training Base: Qwen3 (1.7B / 8B / 14B / 32B) Method: MLX LoRA + direct safetensors merge Eval gate: 90% minimum before deploy Data: synthetic routing examples Full Weights HuggingFace: https://huggingface.co/dcostenco/prism-coder-14b Need coding/IDE quality? Use dcostenco/prism-ide — same routing capability plus 2222 coding eval.

Built For Prism AAC — communication app for non-verbal users Prism Coder — AI dev assistant