114 3 weeks ago

Unsloth-tuned Qwen3 30B mixture‑of‑experts model built for heavy coding, reasoning, and agentic workflows.

3 weeks ago

250cace5dd4f · 25GB ·

qwen3moe
·
30.5B
·
Q6_K
{- if system }SYSTEM: { system } { end -} {- range $i, $m := messages -} {- if eq $m.role "user" -}
{ "stop": [ "USER:", "ASSISTANT:" ], "temperature": 0.7, "top_p": 0.

Readme

UIGEN-X-30B-MoE (GGUF Suite)

Bring the full punch of Unsloth’s latest Qwen3 coder MoE into llama.cpp and Ollama. This pack delivers every major quant of smirki/UIGEN-X-30B-MoE-merged-checkpoint-200 so you can drop 30B-class reasoning and code generation into local workflows without wrestling with custom runtimes.

Why Pick UIGEN-X?

  • Agent-native brain – Trained on multi-step GRPO traces, planning data, and tool invocations. You get deep chain-of-thought and API-call fluency out of the box.
  • Coder-first MoE – Built on Qwen3 30B A3B with 128 routed experts (top‑8 active). It behaves like a 240B dense model when you need nuance, yet stays efficient at inference.
  • Massive context – 262K tokens of window lets you stuff entire repos, session logs, or agent transcripts without truncation.
  • Battle-tested alignment – Safety and instruction polish come from the UIGEN team plus the open-source energy of Unsloth; the model feels confident yet compliant.

Pick Your Quant

Tag Format Approx Size Ideal For
richardyoung/uigen-x-30b-moe:q2_k Q2_K ~10 GB Minimal RAM/NPU deployments where footprint beats fidelity
:q3_k_s Q3_K_S ~12 GB Balanced laptops; good starter for experimentation
:q4_k_m Q4_K_M ~17 GB Default daily-driver on 24 GB GPUs / high-end CPU rigs
:q5_k_m Q5_K_M ~20 GB Premium chat + coding with near-FP response quality
:q6_k Q6_K ~23 GB When you refuse trade-offs but still want GGUF tooling
:q8_0 Q8_0 ~30 GB Benchmarking, further re-quantization, or 48 GB workstation

Each build keeps the original chat template (with /think_on reasoning channel) and ships with tokenizer metadata for plug-and-play use in Ollama or llama.cpp.

Quick Start

ollama pull richardyoung/uigen-x-30b-moe:q4_k_m
ollama run  richardyoung/uigen-x-30b-moe:q4_k_m

Prompt example:

SYSTEM: You are a meticulous senior engineer. Explain your plan before coding.
USER: /think_on We need a Python CLI that syncs a local folder to S3, skipping archives older than 90 days. Provide tests.

Capabilities Snapshot

  • 👨‍💻 Code – Long-form code synthesis, refactors, pytest scaffolding, and shell automation.
  • 🧠 Reasoning – Handles multi-hop deductions, math puzzles, and policy analysis with structured chains.
  • 🧰 Agents & Tools – Great fit for function-calling, LangChain/LlamaIndex wrappers, or self-reflective loops.
  • 📚 Document digestion – Summaries, comparisons, and QA across giant PDFs or repo histories.

Credits & Thanks

  • Base model: Unsloth & smirki – original UIGEN-X-30B training + open release.
  • Qwen team: Alibaba/ModelScope Qwen3 for the stellar foundation.
  • Quantization & distribution: Richard Young · generated via llama.cpp GGUF pipeline for Ollama.

If you build something amazing with UIGEN-X, tag the maintainers—community benchmarks and agent demos help push the ecosystem forward.