A text-only, thinking-capable variant of Qwen3.5-35B-A3B — leaner and faster by removing the CLIP vision projector. Based on Unsloth's Q4_K_M quantization of Alibaba's Qwen3.5-35B-A3B.

Details

Updated 3 months ago

3 months ago

1c80eb582175 · 22GB ·

model

archqwen35moe

parameters34.7B

quantizationQ4_K_M

22GB

license

Apache License Version 2.0, January 2004 http://www.apache.org/licenses/ TERMS AND CONDITIONS FOR US

11kB

params

{ "presence_penalty": 1.5, "temperature": 1, "top_k": 20, "top_p": 0.95 }

65B

template

13B

Qwen3.5-Flash 35B

A text-only, thinking-capable variant of Qwen3.5-35B-A3B — leaner and faster by removing the CLIP vision projector. Based on Unsloth’s Q4_K_M quantization of Alibaba’s Qwen3.5-35B-A3B.

Two tags are available under this model:

Tag	Purpose	Temperature
`mdq100/qwen3.5-flash:35b`	General reasoning, chat, instruction following	1.0
`mdq100/qwen3.5-flash:35b-code`	Coding via OpenCode or coding assistants	0.6

Same weights. Same architecture. Different temperature.

What is this?

Qwen3.5-35B-A3B is a hybrid Mixture-of-Experts model from Alibaba’s Qwen team featuring a novel Gated DeltaNet + sparse MoE architecture. Despite 34.7B total parameters, only ~3B are activated per token, making inference efficient.

This Flash variant strips the CLIP vision projector to produce a clean, text-only model. The LLM weights are unchanged — only vision input is removed.

Why Flash?

The original Qwen3.5-35B-A3B includes a 446M-parameter CLIP vision encoder. Removing it: - Eliminates vision input (no image processing) - Reduces load time and memory overhead - Avoids compatibility issues with vision loading in current Ollama versions - Keeps the full language and reasoning capability intact

Why two tags?

OpenCode and similar coding tools don’t support per-session parameter overrides — they use whatever is baked into the Ollama model. A dedicated coding tag avoids manually tuning parameters per session.

Use :35b for general reasoning, chat, and instruction following
Use :35b-code for coding inside OpenCode or similar assistants

Architecture

Property	Value
Architecture	qwen35moe (Gated DeltaNet + Gated Attention + sparse MoE)
Total parameters	34.7B
Active parameters per token	~3B
Experts	256 total, 8 routed + 1 shared active
Context length	262,144 tokens
Embedding length	2048
Quantization	Q4_K_M (Unsloth Dynamic 2.0)

Capabilities

Completion
Tool calling
Thinking (extended reasoning mode)

Usage

General purpose

ollama pull mdq100/qwen3.5-flash:35b
ollama run mdq100/qwen3.5-flash:35b

Parameters:

temperature: 1.0
top_p: 0.95
top_k: 20
presence_penalty: 1.5

Coding optimized

ollama pull mdq100/qwen3.5-flash:35b-code
ollama run mdq100/qwen3.5-flash:35b-code

Parameters:

temperature: 0.6
top_p: 0.95
top_k: 20

OpenCode

Add to your project’s opencode.json:

{
  "model": "ollama/mdq100/qwen3.5-flash:35b-code"
}

Or globally in ~/.config/opencode/opencode.json.

Benchmarks

Scores below are for the base Qwen3.5-35B-A3B model (BF16, full precision). Q4_K_M quantization may show minor variance (~1-2%).

Coding

Benchmark	Score
SWE-bench Verified	69.2
LiveCodeBench v6	74.6
CodeForces Rating	2028
FullStackBench (en)	58.1
Terminal Bench 2	40.5
OJBench	36.0

Knowledge & Reasoning

Benchmark	Score
MMLU-Pro	85.3
MMLU-Redux	93.3
GPQA Diamond	84.2
HLE w/ CoT	22.4
SuperGPQA	63.4

Instruction Following

Benchmark	Score
IFEval	91.9
IFBench	70.2
MultiChallenge	60.0

Long Context

Benchmark	Score
LongBench v2	59.0
AA-LCR	58.5

Math & STEM

Benchmark	Score
HMMT Feb 25	89.0
HMMT Nov 25	89.2

Multilingual

Benchmark	Score
MMMLU	85.2
MMLU-ProX	81.0
WMT24++	76.3

Vision benchmarks (MMMU, MathVision, etc.) are not applicable to this Flash variant.

Unsloth Improvements

Unsloth Dynamic 2.0 — improved quantization algorithm that preserves more accuracy at lower bit depth, outperforming standard Q4_K_M
Improved imatrix calibration — curated calibration data covering chat, coding, long context, and tool-calling
Reduced KL divergence — output distribution stays closer to the full-precision original
Early availability — published within hours of the original model release

Credits

Original model: Qwen3.5-35B-A3B by Alibaba Qwen Team
GGUF quantization: Unsloth
Flash variant (vision removed, coding tag temperature-optimized): packaged for Ollama + OpenCode