47 Downloads Updated 23 hours ago
ollama run dcostenco/prism-coder:4b
Fine-tuned on Qwen3 for AI agent tool routing. Part of a cascade architecture that handles 97% of queries locally — no cloud needed.
100% routing accuracy (BFCL eval). Beats Claude Opus-solo (97.1%) by 2.9 points.
| Model | Size | Use when |
|---|---|---|
prism-coder:1b7 |
1.1 GB | On-device, fastest (~60% of traffic) |
prism-coder:8b |
5.0 GB | Lightweight local fallback |
prism-coder:14b |
9.0 GB | Default — best balance (~37% of traffic) |
prism-coder:32b |
20 GB | Edge cases the 14B fumbles (~2% of traffic) |
”`bash
ollama run dcostenco/prism-coder:14b
ollama run dcostenco/prism-coder:1b7 Cascade Architecture
1.7B on-device ✅ ~60% — free, instant 14B local ✅ ~37% — catches failures 32B local ✅ ~2% — edge cases Claude Opus 🌩️ ~1% — last resort 97% of traffic never touches the cloud.
Tools Routed session_load_context session_save_ledger session_save_handoff session_compact_ledger session_search_memory knowledge_search plain text (no tool needed) Training Base: Qwen3 (1.7B / 8B / 14B / 32B) Method: MLX LoRA + direct safetensors merge Eval gate: 90% minimum before deploy Data: synthetic routing examples Full Weights HuggingFace: https://huggingface.co/dcostenco/prism-coder-14b Need coding/IDE quality? Use dcostenco/prism-ide — same routing capability plus 22⁄22 coding eval.
Built For Prism AAC — communication app for non-verbal users Prism Coder — AI dev assistant