mdq100/ qwen3.5:27b-96g

41 2 days ago

Custom Qwen3.5 variants optimized for 128GB unified memory systems, such as AMD Ryzen AI Max+ 395. On Windows 11, GPU is limited to 96GB (32GB reserved for OS/CPU), requiring context window capped at 131072 tokens (128K) to fit within GPU memory limits.

vision tools thinking
ollama run mdq100/qwen3.5:27b-96g

Details

2 days ago

14ba371907db · 17GB ·

qwen35
·
27.8B
·
Q4_K_M
Apache License Version 2.0, January 2004 http://www.apache.org/licenses/ TERMS AND CONDITIONS FOR US
{ "num_ctx": 131072, "presence_penalty": 1.5, "temperature": 1, "top_k": 20, "to
{{ .Prompt }}

Readme

These two custom models work well with OpenCode for agentic coding.

Background

Custom Qwen3.5 variants optimized for 128GB unified memory systems, such as AMD Ryzen AI Max+ 395. On Windows 11, GPU is limited to 96GB (32GB reserved for OS/CPU), requiring context window capped at 131072 tokens (128K) to fit within GPU memory limits without timeout. All variants retain full capabilities: vision, tools, and thinking mode.

Tags:

  • 27b-96g — Qwen3.5 27B dense (Q4_K_M), ~32GB GPU, 100% GPU execution
  • 122b-96g — Qwen3.5 122B MoE (Q4_K_M), ~92GB GPU, stable on 96GB unified memory

Base models: qwen3.5:27b, qwen3.5:122b

Hardware tested: GMKtec EVO-X2 AI Mini PC, AMD Ryzen AI Max+ 395, AMD Radeon 8060S, 128GB LPDDR5X-8000, Windows 11 Pro

Motivation for qwen3.5:122b-96g Custom Model

Custom version of qwen3.5:122b with capped context window (128K) to fit within 96GB GPU memory, matching gpt-oss:120b’s context window.

Why This Is Needed

Default Custom
Model qwen3.5:122b qwen3.5:122b-96g
Context window 262,144 tokens 131,072 tokens (128K)
Weights ~70GB ~70GB
KV cache ~27GB ~13.5GB
Total size ~97GB — timeout ~92GB — fits
CPU/GPU split all GPU (OOM) 28% CPU / 72% GPU
Headroom -1GB +4GB

Motivation for qwen3.5:27b-96g Custom Model

Custom version of qwen3.5:27b with capped context window (128K) to reduce unnecessary KV cache memory usage on 96GB GPU.

Why This Is Needed

Default Custom
Model qwen3.5:27b qwen3.5:27b-96g
Context window 262,144 tokens 131,072 tokens (128K)
Weights ~15GB ~15GB
KV cache ~27GB ~13.5GB
Total GPU ~42GB — wasteful ~32GB — lean
CPU/GPU split 100% GPU 100% GPU
Headroom +54GB +64GB

The default model fits in GPU but wastes ~13.5GB on KV cache for context it rarely uses. Capping to 128K matches qwen3.5:122b-96g and gpt-oss:120b for consistency.