114 Downloads Updated 3 weeks ago
Updated 3 weeks ago
3 weeks ago
876b16c181f8 · 11GB ·
Bring the full punch of Unsloth’s latest Qwen3 coder MoE into llama.cpp and Ollama. This pack delivers every major quant of smirki/UIGEN-X-30B-MoE-merged-checkpoint-200 so you can drop 30B-class reasoning and code generation into local workflows without wrestling with custom runtimes.
| Tag | Format | Approx Size | Ideal For |
|---|---|---|---|
richardyoung/uigen-x-30b-moe:q2_k |
Q2_K | ~10 GB | Minimal RAM/NPU deployments where footprint beats fidelity |
:q3_k_s |
Q3_K_S | ~12 GB | Balanced laptops; good starter for experimentation |
:q4_k_m |
Q4_K_M | ~17 GB | Default daily-driver on 24 GB GPUs / high-end CPU rigs |
:q5_k_m |
Q5_K_M | ~20 GB | Premium chat + coding with near-FP response quality |
:q6_k |
Q6_K | ~23 GB | When you refuse trade-offs but still want GGUF tooling |
:q8_0 |
Q8_0 | ~30 GB | Benchmarking, further re-quantization, or 48 GB workstation |
Each build keeps the original chat template (with /think_on reasoning channel) and ships with tokenizer metadata for plug-and-play use in Ollama or llama.cpp.
ollama pull richardyoung/uigen-x-30b-moe:q4_k_m
ollama run richardyoung/uigen-x-30b-moe:q4_k_m
Prompt example:
SYSTEM: You are a meticulous senior engineer. Explain your plan before coding.
USER: /think_on We need a Python CLI that syncs a local folder to S3, skipping archives older than 90 days. Provide tests.
If you build something amazing with UIGEN-X, tag the maintainers—community benchmarks and agent demos help push the ecosystem forward.