42 1 month ago

A high-precision, hardware-aware router built on Phi-4 Mini (3.8B). It acts as a dispatcher for local AI setups, automatically deciding whether a prompt should run on edge hardware (like a Jetson Nano), a local GPU, or the cloud based on task complexity.

tools
ollama run rubinmaximilian/Monk-Router-phi4mini

Applications

Claude Code
Claude Code ollama launch claude --model rubinmaximilian/Monk-Router-phi4mini
Codex App
Codex App ollama launch codex-app --model rubinmaximilian/Monk-Router-phi4mini
OpenClaw
OpenClaw ollama launch openclaw --model rubinmaximilian/Monk-Router-phi4mini
Hermes Agent
Hermes Agent ollama launch hermes --model rubinmaximilian/Monk-Router-phi4mini
Codex
Codex ollama launch codex --model rubinmaximilian/Monk-Router-phi4mini
OpenCode
OpenCode ollama launch opencode --model rubinmaximilian/Monk-Router-phi4mini

Models

View all →

Readme

Monk-Router-phi4mini

Monk-Router is a high-precision routing model designed to manage hardware constraints in local AI setups. I built this to solve a specific problem: keeping simple tasks fast and local on edge devices, while automatically offloading heavy code analysis to larger servers.

Built on Microsoft’s Phi-4 Mini (3.8B) architecture, the model uses roughly 2.5GB of VRAM (Q4). This allows it to stay resident in memory on edge devices like the Jetson Orin Nano or MacBooks, offering deeper reasoning capabilities for complex task delegation without causing out-of-memory errors when the actual worker models load.

How It Works

This model does not generate conversational responses. It is strictly a stateless JSON dispatcher. It is designed to be paired with a back-end script (like Python) that handles the actual execution.

Instead of hard-coding specific models or server destinations (which breaks if you run this on a different machine and limits overall customization and usability), the router expects the back-end to pass a list of currently available hardware and models. It then routes the user’s prompt to the most logical destination.

The Routing Logic

The model evaluates the prompt and routes it based on three categories:

1. Hardware Tiers (set_server) - tier_1_edge: Simple tasks or fast queries (keeps the task local). - tier_2_main: Heavy logic, or analyzing large files >100 lines (offloads to a main PC/GPU). - tier_3_cloud: Extremely large context requirements or fallback APIs.

2. Model Capabilities (switch_model) - Maps the task to the right tool: code_small, code_big, writing, or general_reasoning.

3. Multi-Model Workflows (activate_swarm) - Triggers custom back-end workflows if the task requires a multi-step review (e.g., cybersec_tester or code_review).

Usage Example

1. What the Python backend sends to the router: “`text AVAILABLE RESOURCES: - Capabilities: [‘code_small’, ‘code_big’, ‘general_reasoning’] - Server Tiers: [‘tier_1_edge’, ‘tier_2_main’] USER REQUEST: “Analyze this 2,000 line C++ file.”

Example Output:

{ “logic”: “Massive codebase analysis exceeds edge capacity.”, “tool_call”: { “name”: “set_server”, “parameters”: { “tier”: “tier_2_main” } } }