191 yesterday

A text-only, thinking-capable variant of Qwen3.5-35B-A3B — leaner and faster by removing the CLIP vision projector. Based on Unsloth's Q4_K_M quantization of Alibaba's Qwen3.5-35B-A3B.

tools thinking 35b
ollama run mdq100/qwen3.5-flash:35b

Applications

Claude Code
Claude Code ollama launch claude --model mdq100/qwen3.5-flash:35b
Codex
Codex ollama launch codex --model mdq100/qwen3.5-flash:35b
OpenCode
OpenCode ollama launch opencode --model mdq100/qwen3.5-flash:35b
OpenClaw
OpenClaw ollama launch openclaw --model mdq100/qwen3.5-flash:35b

Models

View all →

Readme

Qwen3.5-Flash 35B

Qwen3.5

A text-only, thinking-capable variant of Qwen3.5-35B-A3B — leaner and faster by removing the CLIP vision projector. Based on Unsloth’s Q4_K_M quantization of Alibaba’s Qwen3.5-35B-A3B.

Two tags are available under this model:

Tag Purpose Temperature
mdq100/qwen3.5-flash:35b General reasoning, chat, instruction following 1.0
mdq100/qwen3.5-flash:35b-code Coding via OpenCode or coding assistants 0.6

Same weights. Same architecture. Different temperature.


What is this?

Qwen3.5-35B-A3B is a hybrid Mixture-of-Experts model from Alibaba’s Qwen team featuring a novel Gated DeltaNet + sparse MoE architecture. Despite 34.7B total parameters, only ~3B are activated per token, making inference efficient.

This Flash variant strips the CLIP vision projector to produce a clean, text-only model. The LLM weights are unchanged — only vision input is removed.

Why Flash?

The original Qwen3.5-35B-A3B includes a 446M-parameter CLIP vision encoder. Removing it: - Eliminates vision input (no image processing) - Reduces load time and memory overhead - Avoids compatibility issues with vision loading in current Ollama versions - Keeps the full language and reasoning capability intact

Why two tags?

OpenCode and similar coding tools don’t support per-session parameter overrides — they use whatever is baked into the Ollama model. A dedicated coding tag avoids manually tuning parameters per session.

  • Use :35b for general reasoning, chat, and instruction following
  • Use :35b-code for coding inside OpenCode or similar assistants

Architecture

Property Value
Architecture qwen35moe (Gated DeltaNet + Gated Attention + sparse MoE)
Total parameters 34.7B
Active parameters per token ~3B
Experts 256 total, 8 routed + 1 shared active
Context length 262,144 tokens
Embedding length 2048
Quantization Q4_K_M (Unsloth Dynamic 2.0)

Capabilities

  • Completion
  • Tool calling
  • Thinking (extended reasoning mode)

Usage

General purpose

ollama pull mdq100/qwen3.5-flash:35b
ollama run mdq100/qwen3.5-flash:35b

Parameters:

temperature: 1.0
top_p: 0.95
top_k: 20
presence_penalty: 1.5

Coding optimized

ollama pull mdq100/qwen3.5-flash:35b-code
ollama run mdq100/qwen3.5-flash:35b-code

Parameters:

temperature: 0.6
top_p: 0.95
top_k: 20

OpenCode

Add to your project’s opencode.json:

{
  "model": "ollama/mdq100/qwen3.5-flash:35b-code"
}

Or globally in ~/.config/opencode/opencode.json.


Benchmarks

Scores below are for the base Qwen3.5-35B-A3B model (BF16, full precision). Q4_K_M quantization may show minor variance (~1-2%).

Coding

Benchmark Score
SWE-bench Verified 69.2
LiveCodeBench v6 74.6
CodeForces Rating 2028
FullStackBench (en) 58.1
Terminal Bench 2 40.5
OJBench 36.0

Knowledge & Reasoning

Benchmark Score
MMLU-Pro 85.3
MMLU-Redux 93.3
GPQA Diamond 84.2
HLE w/ CoT 22.4
SuperGPQA 63.4

Instruction Following

Benchmark Score
IFEval 91.9
IFBench 70.2
MultiChallenge 60.0

Long Context

Benchmark Score
LongBench v2 59.0
AA-LCR 58.5

Math & STEM

Benchmark Score
HMMT Feb 25 89.0
HMMT Nov 25 89.2

Multilingual

Benchmark Score
MMMLU 85.2
MMLU-ProX 81.0
WMT24++ 76.3

Vision benchmarks (MMMU, MathVision, etc.) are not applicable to this Flash variant.


Unsloth Improvements

  • Unsloth Dynamic 2.0 — improved quantization algorithm that preserves more accuracy at lower bit depth, outperforming standard Q4_K_M
  • Improved imatrix calibration — curated calibration data covering chat, coding, long context, and tool-calling
  • Reduced KL divergence — output distribution stays closer to the full-precision original
  • Early availability — published within hours of the original model release

Credits

  • Original model: Qwen3.5-35B-A3B by Alibaba Qwen Team
  • GGUF quantization: Unsloth
  • Flash variant (vision removed, coding tag temperature-optimized): packaged for Ollama + OpenCode