38 Downloads Updated 2 days ago
ollama run mdq100/qwen3.5:27b-96g
ollama launch claude --model mdq100/qwen3.5:27b-96g
ollama launch codex --model mdq100/qwen3.5:27b-96g
ollama launch opencode --model mdq100/qwen3.5:27b-96g
ollama launch openclaw --model mdq100/qwen3.5:27b-96g
OpenCode for agentic coding.Custom Qwen3.5 variants optimized for 128GB unified memory systems, such as AMD Ryzen AI Max+ 395. On Windows 11, GPU is limited to 96GB (32GB reserved for OS/CPU), requiring context window capped at 131072 tokens (128K) to fit within GPU memory limits without timeout. All variants retain full capabilities: vision, tools, and thinking mode.
Tags:
Base models: qwen3.5:27b, qwen3.5:122b
Hardware tested: GMKtec EVO-X2 AI Mini PC, AMD Ryzen AI Max+ 395, AMD Radeon 8060S, 128GB LPDDR5X-8000, Windows 11 Pro
Custom version of qwen3.5:122b with capped context window (128K) to fit within 96GB GPU memory, matching gpt-oss:120b’s context window.
| Default | Custom | |
|---|---|---|
| Model | qwen3.5:122b |
qwen3.5:122b-96g |
| Context window | 262,144 tokens | 131,072 tokens (128K) |
| Weights | ~70GB | ~70GB |
| KV cache | ~27GB | ~13.5GB |
| Total size | ~97GB — timeout | ~92GB — fits |
| CPU/GPU split | all GPU (OOM) | 28% CPU / 72% GPU |
| Headroom | -1GB | +4GB |
Custom version of qwen3.5:27b with capped context window (128K) to reduce unnecessary KV cache memory usage on 96GB GPU.
| Default | Custom | |
|---|---|---|
| Model | qwen3.5:27b |
qwen3.5:27b-96g |
| Context window | 262,144 tokens | 131,072 tokens (128K) |
| Weights | ~15GB | ~15GB |
| KV cache | ~27GB | ~13.5GB |
| Total GPU | ~42GB — wasteful | ~32GB — lean |
| CPU/GPU split | 100% GPU | 100% GPU |
| Headroom | +54GB | +64GB |
The default model fits in GPU but wastes ~13.5GB on KV cache for context it rarely uses. Capping to 128K matches qwen3.5:122b-96g and gpt-oss:120b for consistency.