Unsloth-tuned Qwen3 30B mixture‑of‑experts model built for heavy coding, reasoning, and agentic workflows.

Details

Updated 4 months ago

4 months ago

876b16c181f8 · 11GB ·

model

archqwen3moe

parameters30.5B

quantizationQ2_K

11GB

template

{- if system }SYSTEM: { system } { end -} {- range $i, $m := messages -} {- if eq $m.role "user" -}

235B

params

{ "stop": [ "USER:", "ASSISTANT:" ], "temperature": 0.7, "top_p": 0.

62B

UIGEN-X-30B-MoE (GGUF Suite)

Bring the full punch of Unsloth’s latest Qwen3 coder MoE into llama.cpp and Ollama. This pack delivers every major quant of smirki/UIGEN-X-30B-MoE-merged-checkpoint-200 so you can drop 30B-class reasoning and code generation into local workflows without wrestling with custom runtimes.

Why Pick UIGEN-X?

Agent-native brain – Trained on multi-step GRPO traces, planning data, and tool invocations. You get deep chain-of-thought and API-call fluency out of the box.
Coder-first MoE – Built on Qwen3 30B A3B with 128 routed experts (top‑8 active). It behaves like a 240B dense model when you need nuance, yet stays efficient at inference.
Massive context – 262K tokens of window lets you stuff entire repos, session logs, or agent transcripts without truncation.
Battle-tested alignment – Safety and instruction polish come from the UIGEN team plus the open-source energy of Unsloth; the model feels confident yet compliant.

Pick Your Quant

Tag	Format	Approx Size	Ideal For
`richardyoung/uigen-x-30b-moe:q2_k`	Q2_K	~10 GB	Minimal RAM/NPU deployments where footprint beats fidelity
`:q3_k_s`	Q3_K_S	~12 GB	Balanced laptops; good starter for experimentation
`:q4_k_m`	Q4_K_M	~17 GB	Default daily-driver on 24 GB GPUs / high-end CPU rigs
`:q5_k_m`	Q5_K_M	~20 GB	Premium chat + coding with near-FP response quality
`:q6_k`	Q6_K	~23 GB	When you refuse trade-offs but still want GGUF tooling
`:q8_0`	Q8_0	~30 GB	Benchmarking, further re-quantization, or 48 GB workstation

Each build keeps the original chat template (with /think_on reasoning channel) and ships with tokenizer metadata for plug-and-play use in Ollama or llama.cpp.

Quick Start

ollama pull richardyoung/uigen-x-30b-moe:q4_k_m
ollama run  richardyoung/uigen-x-30b-moe:q4_k_m

Prompt example:

SYSTEM: You are a meticulous senior engineer. Explain your plan before coding.
USER: /think_on We need a Python CLI that syncs a local folder to S3, skipping archives older than 90 days. Provide tests.

Capabilities Snapshot

👨‍💻 Code – Long-form code synthesis, refactors, pytest scaffolding, and shell automation.
🧠 Reasoning – Handles multi-hop deductions, math puzzles, and policy analysis with structured chains.
🧰 Agents & Tools – Great fit for function-calling, LangChain/LlamaIndex wrappers, or self-reflective loops.
📚 Document digestion – Summaries, comparisons, and QA across giant PDFs or repo histories.

Credits & Thanks

Base model: Unsloth & smirki – original UIGEN-X-30B training + open release.
Qwen team: Alibaba/ModelScope Qwen3 for the stellar foundation.
Quantization & distribution: Richard Young · generated via llama.cpp GGUF pipeline for Ollama.

If you build something amazing with UIGEN-X, tag the maintainers—community benchmarks and agent demos help push the ecosystem forward.