📦 anarko/qwen3-coder-flash

🚀 Quick Start

# Pull the quantized model
ollama pull anarko/qwen3-coder-flash:30b
# Run the model
ollama run anarko/qwen3-coder-flash:30b

📚 Model Overview

Source Model: Qwen3-Coder-30B-A3B-Instruct (Hugging Face)
Quantized Version: UD‑Q4_K_XL

Key Features

Parameters: 30.5 B total, 3.3 B activated
Layers: 48
Attention Heads (GQA): 32 Q, 4 KV
Experts: 128 (8 activated)
Native Context Length: 262,144 tokens
Default Ollama Context Length: 145,408 tokens
VRAM Usage (145,408 tokens): ≈32 GB
Thinking Mode: Not supported

Runtime Parameters

>>> /show parameters

Output:

Model defined parameters:
 min_p                          0
 num_ctx                        145408
 repeat_penalty                 1.05
 stop                           "<|im_start|>"
 stop                           "<|im_end|>"
 temperature                    0.7
 top_k                          20
 top_p                          0.8

Memory Usage (Context Length vs VRAM)

Context Length (Tokens)	Approx. VRAM Usage (GB)
142 k (default)	32 GB
256 k	44 GB

This Ollama model is based on the Hugging Face unsloth/Qwen3-Coder-30B-A3B-Instruct-GGUF. No changes had been made to it other than the context length.

Unsloth/Qwen3-Coder-30B-A3B-Instruct-GGUF with defaults to consume roughly 32 GB of VRAM

Readme

📦 anarko/qwen3-coder-flash

🚀 Quick Start

📚 Model Overview

Key Features

Runtime Parameters

Memory Usage (Context Length vs VRAM)

Unsloth/Qwen3-Coder-30B-A3B-Instruct-GGUF with defaults to consume roughly 32 GB of VRAM

Readme

📦 anarko/qwen3-coder-flash

🚀 Quick Start

📚 Model Overview

Key Features

Runtime Parameters

Memory Usage (Context Length vs VRAM)

Unsloth/Qwen3-Coder-30B-A3B-Instruct-GGUF with defaults to consume roughly 32 GB of VRAM