SimonPu/Qwen3.5:35b-Distilled

Qwen3.5 Series' First Model: The Open-Weight Version of Qwen3.5. As a native vision-language model, Qwen3.5 excels across comprehensive benchmark evaluations in reasoning, programming, agent capabilities, and multimodal understanding, empowering developer

Details

Updated 2 weeks ago

2 weeks ago

611f3f5e7195 · 21GB ·

model

archqwen35moe

parameters34.7B

quantizationQ4_K_M

21GB

template

13B

license

Apache License Version 2.0, January 2004 http://www.apache.org/licenses/ TERMS AND CONDITIONS FOR US

12kB

params

{ "min_p": 0, "presence_penalty": 0, "temperature": 0.6, "top_k": 20, "top_p": 0

75B

📢 Announcement:

2026/03/26:

Please note that the Qwen3.5:27b-Distilled was generated by the Base model using 🌟 Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled-v2.

v2 Update: - Accuracy preserved: Matches base model on HumanEval (96.91% pass@1)

Shorter reasoning: ~24% reduction in chain-of-thought length

Higher efficiency: +31.6% more correct solutions per token

⚠️Trade-off: −1.24% on HumanEval+ −7.2% on MMLU-Pro (Indicating reduced general knowledge reasoning performance)

⚠️Note: Due to the scope of SFT data and training focus, the model may underperform the base model on certain tasks requiring long-context understanding or more complex multi-step reasoning. The efficiency and accuracy results reported here are based solely on the HumanEval and HumanEval+ benchmarks. Thank you for your understanding.

Qwen3.5 Highlights

Qwen3.5 features the following enhancement:

Unified Vision-Language Foundation: Early fusion training on multimodal tokens achieves cross-generational parity with Qwen3 and outperforms Qwen3-VL models across reasoning, coding, agents, and visual understanding benchmarks.
Efficient Hybrid Architecture: Gated Delta Networks combined with sparse Mixture-of-Experts deliver high-throughput inference with minimal latency and cost overhead.
Scalable RL Generalization: Reinforcement learning scaled across million-agent environments with progressively complex task distributions for robust real-world adaptability.
Global Linguistic Coverage: Expanded support to 201 languages and dialects, enabling inclusive, worldwide deployment with nuanced cultural and regional understanding.
Next-Generation Training Infrastructure: Near-100% multimodal training efficiency compared to text-only training and asynchronous RL frameworks supporting massive-scale agent scaffolds and environment orchestration.

For more details, please refer to our blog post Qwen3.5.

Model Parameters Values

This section has been specially adjusted for precise coding tasks:

temperature=0.6
min_p=0.0
presence_penalty=0.0
repetition_penalty=1.0

Citation

If you find our work helpful, feel free to give us a cite.

@misc{qwen3.5,
    title  = {{Qwen3.5}: Towards Native Multimodal Agents},
    author = {{Qwen Team}},
    month  = {February},
    year   = {2026},
    url    = {https://qwen.ai/blog?id=qwen3.5}
}

@misc{jackrong_qwen35_opus_distilled,
  title        = {Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled-v2},
  author       = {Jackrong},
  year         = {2026},
  publisher    = {Hugging Face},
  howpublished = {\url{https://huggingface.co/Jackrong/Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled-v2}}