197 Downloads Updated yesterday
ollama run mdq100/qwen3.5-flash:35b
Updated yesterday
yesterday
1c80eb582175 · 22GB ·

A text-only, thinking-capable variant of Qwen3.5-35B-A3B — leaner and faster by removing the CLIP vision projector. Based on Unsloth’s Q4_K_M quantization of Alibaba’s Qwen3.5-35B-A3B.
Two tags are available under this model:
| Tag | Purpose | Temperature |
|---|---|---|
mdq100/qwen3.5-flash:35b |
General reasoning, chat, instruction following | 1.0 |
mdq100/qwen3.5-flash:35b-code |
Coding via OpenCode or coding assistants | 0.6 |
Same weights. Same architecture. Different temperature.
Qwen3.5-35B-A3B is a hybrid Mixture-of-Experts model from Alibaba’s Qwen team featuring a novel Gated DeltaNet + sparse MoE architecture. Despite 34.7B total parameters, only ~3B are activated per token, making inference efficient.
This Flash variant strips the CLIP vision projector to produce a clean, text-only model. The LLM weights are unchanged — only vision input is removed.
The original Qwen3.5-35B-A3B includes a 446M-parameter CLIP vision encoder. Removing it: - Eliminates vision input (no image processing) - Reduces load time and memory overhead - Avoids compatibility issues with vision loading in current Ollama versions - Keeps the full language and reasoning capability intact
OpenCode and similar coding tools don’t support per-session parameter overrides — they use whatever is baked into the Ollama model. A dedicated coding tag avoids manually tuning parameters per session.
:35b for general reasoning, chat, and instruction following:35b-code for coding inside OpenCode or similar assistants| Property | Value |
|---|---|
| Architecture | qwen35moe (Gated DeltaNet + Gated Attention + sparse MoE) |
| Total parameters | 34.7B |
| Active parameters per token | ~3B |
| Experts | 256 total, 8 routed + 1 shared active |
| Context length | 262,144 tokens |
| Embedding length | 2048 |
| Quantization | Q4_K_M (Unsloth Dynamic 2.0) |
ollama pull mdq100/qwen3.5-flash:35b
ollama run mdq100/qwen3.5-flash:35b
Parameters:
temperature: 1.0
top_p: 0.95
top_k: 20
presence_penalty: 1.5
ollama pull mdq100/qwen3.5-flash:35b-code
ollama run mdq100/qwen3.5-flash:35b-code
Parameters:
temperature: 0.6
top_p: 0.95
top_k: 20
Add to your project’s opencode.json:
{
"model": "ollama/mdq100/qwen3.5-flash:35b-code"
}
Or globally in ~/.config/opencode/opencode.json.
Scores below are for the base Qwen3.5-35B-A3B model (BF16, full precision). Q4_K_M quantization may show minor variance (~1-2%).
| Benchmark | Score |
|---|---|
| SWE-bench Verified | 69.2 |
| LiveCodeBench v6 | 74.6 |
| CodeForces Rating | 2028 |
| FullStackBench (en) | 58.1 |
| Terminal Bench 2 | 40.5 |
| OJBench | 36.0 |
| Benchmark | Score |
|---|---|
| MMLU-Pro | 85.3 |
| MMLU-Redux | 93.3 |
| GPQA Diamond | 84.2 |
| HLE w/ CoT | 22.4 |
| SuperGPQA | 63.4 |
| Benchmark | Score |
|---|---|
| IFEval | 91.9 |
| IFBench | 70.2 |
| MultiChallenge | 60.0 |
| Benchmark | Score |
|---|---|
| LongBench v2 | 59.0 |
| AA-LCR | 58.5 |
| Benchmark | Score |
|---|---|
| HMMT Feb 25 | 89.0 |
| HMMT Nov 25 | 89.2 |
| Benchmark | Score |
|---|---|
| MMMLU | 85.2 |
| MMLU-ProX | 81.0 |
| WMT24++ | 76.3 |
Vision benchmarks (MMMU, MathVision, etc.) are not applicable to this Flash variant.