210 2 weeks ago

Unsloth/Qwen3-Coder-30B-A3B-Instruct-GGUF with defaults to consume roughly 32 GB of VRAM

tools 30b

2 weeks ago

88cc37b8585d · 18GB ·

qwen3moe
·
30.5B
·
Q4_K_M
{{- if .Messages }} {{- if or .System .Tools }}<|im_start|>system {{- if .System }} {{ .System }} {{
{ "min_p": 0, "repeat_penalty": 1.05, "stop": [ "<|im_start|>", "<|im_en

Readme

📦 anarko/qwen3-coder-flash

🚀 Quick Start

# Pull the quantized model
ollama pull anarko/qwen3-coder-flash:30b
# Run the model
ollama run anarko/qwen3-coder-flash:30b

📚 Model Overview

Source Model: Qwen3-Coder-30B-A3B-Instruct (Hugging Face)
Quantized Version: UD‑Q4_K_XL

Key Features

  • Parameters: 30.5 B total, 3.3 B activated
  • Layers: 48
  • Attention Heads (GQA): 32 Q, 4 KV
  • Experts: 128 (8 activated)
  • Native Context Length: 262,144 tokens
  • Default Ollama Context Length: 145,408 tokens
  • VRAM Usage (145,408 tokens): ≈32 GB
  • Thinking Mode: Not supported

Runtime Parameters

>>> /show parameters

Output:

Model defined parameters:
 min_p                          0
 num_ctx                        145408
 repeat_penalty                 1.05
 stop                           "<|im_start|>"
 stop                           "<|im_end|>"
 temperature                    0.7
 top_k                          20
 top_p                          0.8

Memory Usage (Context Length vs VRAM)

Context Length (Tokens) Approx. VRAM Usage (GB)
142 k (default) 32 GB
256 k 44 GB

This Ollama model is based on the Hugging Face unsloth/Qwen3-Coder-30B-A3B-Instruct-GGUF. No changes had been made to it other than the context length.