130 Downloads Updated 3 days ago
ollama run odytrice/qwen3.6-27b:5090
ollama launch claude --model odytrice/qwen3.6-27b:5090
ollama launch codex-app --model odytrice/qwen3.6-27b:5090
ollama launch openclaw --model odytrice/qwen3.6-27b:5090
ollama launch hermes --model odytrice/qwen3.6-27b:5090
ollama launch codex --model odytrice/qwen3.6-27b:5090
ollama launch opencode --model odytrice/qwen3.6-27b:5090
Qwen 3.6 27B dense, multimodal (text + image + video), thinking + native tool calling, 262K native context.
Model card for odytrice/qwen3.6-27b:5090. The dense 27B at Q4_K_M (~17 GB)
does not leave usable KV cache headroom on a 24 GB 4090 at any practical
context length, so only a 5090 profile is provided.
| Field | Value |
|---|---|
| Upstream | Qwen/Qwen3.6-27B |
| NVFP4 source | unsloth/Qwen3.6-27B-NVFP4 |
| FP8 source | Qwen/Qwen3.6-27B-FP8 |
| Family | Qwen 3.6 (Alibaba) |
| Architecture | Dense |
| Params | ~27-28B |
| Modalities | Text + Image + Video (vision) |
| Languages | 100+ |
| Tool calling | Native (qwen3_coder parser in vLLM/SGLang) |
| Thinking mode | Default on; togglable via enable_thinking |
| Native context | 262,144 (extensible to 1,010,000 via YaRN) |
| License | Apache 2.0 |
| Tag | GPU | Quantization | KV cache | num_ctx |
|---|---|---|---|---|
odytrice/qwen3.6-27b:5090 |
RTX 5090 (32 GB Blackwell) | Q4_K_M (~17 GB), NVFP4 future | q8_0 | 190000 |
190000 matches the gateway config exactly. 32 GB comfortably fits the weights plus q8_0 KV cache for 190K context. Below the model’s 262K native window - no YaRN scaling required.
Always set these before running Ollama:
set OLLAMA_KV_CACHE_TYPE=q4_0 # Windows
set OLLAMA_FLASH_ATTENTION=1
export OLLAMA_KV_CACHE_TYPE=q4_0 # Linux/macOS
export OLLAMA_FLASH_ATTENTION=1
Per the Qwen team’s published guidance:
# Thinking mode - general tasks (default)
temperature 1.0
top_p 0.95
top_k 20
min_p 0.0
presence_penalty 1.5
repetition_penalty 1.0
# Thinking mode - precise coding (e.g. WebDev)
temperature 0.6
top_p 0.95
top_k 20
presence_penalty 0.0
# Instruct (non-thinking) mode
temperature 0.7
top_p 0.80
top_k 20
presence_penalty 1.5
To disable thinking: pass enable_thinking=False via
chat_template_kwargs (vLLM/SGLang).
qwen3_coder parser)