Custom Qwen3.5 variants optimized for 128GB unified memory systems, such as AMD Ryzen AI Max+ 395. On Windows 11, GPU is limited to 96GB (32GB reserved for OS/CPU), requiring context window capped at 131072 tokens (128K) to fit within GPU memory limits.

Applications

Claude Code ollama launch claude --model mdq100/qwen3.5:27b-96g

Codex ollama launch codex --model mdq100/qwen3.5:27b-96g

OpenCode ollama launch opencode --model mdq100/qwen3.5:27b-96g

OpenClaw ollama launch openclaw --model mdq100/qwen3.5:27b-96g

These two custom models work well with `OpenCode` for agentic coding.

Background

Tags:

27b-96g — Qwen3.5 27B dense (Q4_K_M), ~32GB GPU, 100% GPU execution
122b-96g — Qwen3.5 122B MoE (Q4_K_M), ~92GB GPU, stable on 96GB unified memory

Base models: qwen3.5:27b, qwen3.5:122b

Hardware tested: GMKtec EVO-X2 AI Mini PC, AMD Ryzen AI Max+ 395, AMD Radeon 8060S, 128GB LPDDR5X-8000, Windows 11 Pro

Motivation for qwen3.5:122b-96g Custom Model

Custom version of qwen3.5:122b with capped context window (128K) to fit within 96GB GPU memory, matching gpt-oss:120b’s context window.

Why This Is Needed

	Default	Custom
Model	`qwen3.5:122b`	`qwen3.5:122b-96g`
Context window	262,144 tokens	131,072 tokens (128K)
Weights	~70GB	~70GB
KV cache	~27GB	~13.5GB
Total size	~97GB — timeout	~92GB — fits
CPU/GPU split	all GPU (OOM)	28% CPU / 72% GPU
Headroom	-1GB	+4GB

Motivation for qwen3.5:27b-96g Custom Model

Custom version of qwen3.5:27b with capped context window (128K) to reduce unnecessary KV cache memory usage on 96GB GPU.

Why This Is Needed

	Default	Custom
Model	`qwen3.5:27b`	`qwen3.5:27b-96g`
Context window	262,144 tokens	131,072 tokens (128K)
Weights	~15GB	~15GB
KV cache	~27GB	~13.5GB
Total GPU	~42GB — wasteful	~32GB — lean
CPU/GPU split	100% GPU	100% GPU
Headroom	+54GB	+64GB

The default model fits in GPU but wastes ~13.5GB on KV cache for context it rarely uses. Capping to 128K matches qwen3.5:122b-96g and gpt-oss:120b for consistency.

Custom Qwen3.5 variants optimized for 128GB unified memory systems, such as AMD Ryzen AI Max+ 395. On Windows 11, GPU is limited to 96GB (32GB reserved for OS/CPU), requiring context window capped at 131072 tokens (128K) to fit within GPU memory limits.

Applications

Models

Readme

These two custom models work well with `OpenCode` for agentic coding.

Background

Motivation for qwen3.5:122b-96g Custom Model

Why This Is Needed

Motivation for qwen3.5:27b-96g Custom Model

Why This Is Needed

Custom Qwen3.5 variants optimized for 128GB unified memory systems, such as AMD Ryzen AI Max+ 395. On Windows 11, GPU is limited to 96GB (32GB reserved for OS/CPU), requiring context window capped at 131072 tokens (128K) to fit within GPU memory limits.

Applications

Models

Readme

These two custom models work well with OpenCode for agentic coding.

Background

Motivation for qwen3.5:122b-96g Custom Model

Why This Is Needed

Motivation for qwen3.5:27b-96g Custom Model

Why This Is Needed

These two custom models work well with `OpenCode` for agentic coding.