Custom Qwen3.5 variants optimized for 128GB unified memory systems, such as AMD Ryzen AI Max+ 395. On Windows 11, GPU is limited to 96GB (32GB reserved for OS/CPU), requiring context window capped at 131072 tokens (128K) to fit within GPU memory limits.

Details

Updated 2 days ago

2 days ago

14ba371907db · 17GB ·

model

archqwen35

parameters27.8B

quantizationQ4_K_M

17GB

license

Apache License Version 2.0, January 2004 http://www.apache.org/licenses/ TERMS AND CONDITIONS FOR US

11kB

params

{ "num_ctx": 131072, "presence_penalty": 1.5, "temperature": 1, "top_k": 20, "to

82B

template

13B

These two custom models work well with `OpenCode` for agentic coding.

Background

Tags:

27b-96g — Qwen3.5 27B dense (Q4_K_M), ~32GB GPU, 100% GPU execution
122b-96g — Qwen3.5 122B MoE (Q4_K_M), ~92GB GPU, stable on 96GB unified memory

Base models: qwen3.5:27b, qwen3.5:122b

Hardware tested: GMKtec EVO-X2 AI Mini PC, AMD Ryzen AI Max+ 395, AMD Radeon 8060S, 128GB LPDDR5X-8000, Windows 11 Pro

Motivation for qwen3.5:122b-96g Custom Model

Custom version of qwen3.5:122b with capped context window (128K) to fit within 96GB GPU memory, matching gpt-oss:120b’s context window.

Why This Is Needed

	Default	Custom
Model	`qwen3.5:122b`	`qwen3.5:122b-96g`
Context window	262,144 tokens	131,072 tokens (128K)
Weights	~70GB	~70GB
KV cache	~27GB	~13.5GB
Total size	~97GB — timeout	~92GB — fits
CPU/GPU split	all GPU (OOM)	28% CPU / 72% GPU
Headroom	-1GB	+4GB

Motivation for qwen3.5:27b-96g Custom Model

Custom version of qwen3.5:27b with capped context window (128K) to reduce unnecessary KV cache memory usage on 96GB GPU.

Why This Is Needed

	Default	Custom
Model	`qwen3.5:27b`	`qwen3.5:27b-96g`
Context window	262,144 tokens	131,072 tokens (128K)
Weights	~15GB	~15GB
KV cache	~27GB	~13.5GB
Total GPU	~42GB — wasteful	~32GB — lean
CPU/GPU split	100% GPU	100% GPU
Headroom	+54GB	+64GB

The default model fits in GPU but wastes ~13.5GB on KV cache for context it rarely uses. Capping to 128K matches qwen3.5:122b-96g and gpt-oss:120b for consistency.

Custom Qwen3.5 variants optimized for 128GB unified memory systems, such as AMD Ryzen AI Max+ 395. On Windows 11, GPU is limited to 96GB (32GB reserved for OS/CPU), requiring context window capped at 131072 tokens (128K) to fit within GPU memory limits.

Details

Readme

These two custom models work well with OpenCode for agentic coding.

Background

Motivation for qwen3.5:122b-96g Custom Model

Why This Is Needed

Motivation for qwen3.5:27b-96g Custom Model

Why This Is Needed

These two custom models work well with `OpenCode` for agentic coding.