13 4 hours ago

A low-latency agentic router powered by the brand-new Gemma 4 E2B (Effective 2B) architecture. Optimized for the Monk AI framework to provide near-instant task delegation and "Edge-to-Business" tool-calling on Jetson hardware.

vision tools thinking audio
ollama run rubinmaximilian/Monk-Router-Gemma4e2b

Details

4 hours ago

86572402fe95 · 7.2GB ·

gemma4
·
5.12B
·
Q4_K_M
Apache License Version 2.0, January 2004 http://www.apache.org/licenses/ TERMS AND CONDITIONS FOR US
You are the Monk AI Logic Router. Your ONLY purpose is to output valid JSON. DO NOT provide explanat
{ "num_ctx": 2048, "stop": [ "<|turn|>", "<end_of_turn>" ], "tempera
[{"role":"user","content":"Can you quickly draft an email to my boss about the meeting?"},{"role":"a

Readme

Monk-Router-gemma4e2b

Monk-Router-gemma4e2b is a performance-first router designed for the Monk AI assistant. Leveraging Google’s 2026 E2B (Effective 2 Billion) architecture, it offers the fastest possible decision-making speed for edge computing environments (versus the similar model based on Phi4-mini).

This model is ideal for users prioritizing latency and VRAM efficiency on devices like the Jetson Orin Nano or MacBook Air. Please let me know if I should make an even larger model for scaled applications!

Performance

  • Low VRAM Footprint: Uses ~1.5GB of VRAM, allowing it to stay resident while worker models load.
  • Agentic Efficiency: Built on Gemma 4’s native tool-calling distillation for superior instruction following.
  • Speed: Optimized for “Time to First Token” (TTFT) in real-time assistant workflows.

Logic Thresholds

  • Edge Route: Simple queries, small code snippets (<100 lines), and general chat.
  • GPU Route: High-VRAM requirements, multi-file analysis, and thermal-intensive tasks.

Example Output

”`json { “logic”: “General logic task. Keeping on local Jetson.”, “tool_call”: { “name”: “switch_model”, “parameters”: { “model_name”: “gemma4-e2b” } } }