947 1 month ago

Verified 80B MoE (Unsloth GGUF) build of Qwen3-Coder-Next. Validated for Ollama v0.15.5+ with fixed chat templates

tools
ollama run bazobehram/qwen3-coder-next

Applications

Claude Code
Claude Code ollama launch claude --model bazobehram/qwen3-coder-next
Codex
Codex ollama launch codex --model bazobehram/qwen3-coder-next
OpenCode
OpenCode ollama launch opencode --model bazobehram/qwen3-coder-next
OpenClaw
OpenClaw ollama launch openclaw --model bazobehram/qwen3-coder-next

Models

View all →

Readme

 Qwen3-Coder-Next (80B MoE) - Community Build

This is the **Qwen3-Coder-Next** model, converted to GGUF format by **Unsloth** (Q4_K_M quantization). It is designed to work with the latest Ollama Release Candidates (v0.15.5+) that support the new MoE/SSM architecture.

## Why use this build?
While an official library model exists, this build offers:

* **Unsloth Quantization:** Uses the widely respected Unsloth GGUF conversion, which some users find offers better stability or reasoning retention.
* **Verified Compatibility:** Personally tested and verified to work on `ollama/ollama:0.15.5-rc1`.
* **Alternative Template:** Configured with a robust standard chat template that ensures consistent chat performance if the official model's experimental template causes issues.

## Requirements
* **Ollama Version:** Must be `v0.15.5` or newer (Release Candidate).
* **RAM:** ~48GB system RAM/VRAM required.

## Usage

```bash
ollama run bazobehram/qwen3-coder-next

Based on [unsloth/Qwen3-Coder-Next-GGUF](https://huggingface.co/unsloth/Qwen3-Coder-Next-GGUF)


Key Benefit vs. Official

The main difference is the Quantization Source. The official Ollama model might use llama.cpp’s default quantization. Unsloth often applies extra tricks or specific calibration datasets when creating their GGUFs, which can sometimes result in a “smarter” model at the same 4-bit size. By sharing this, you give people a choice to see which one performs better for their specific coding tasks.