525 Downloads Updated 3 months ago
Updated 3 months ago
3 months ago
cf3d85b604d6 · 18GB ·
We introduce the updated version of the Qwen3-30B-A3B mode, named Qwen3-30B-A3B-Instruct-2507 and Qwen3-30B-A3B-Thinking-2507, featuring the following key enhancements:


This repo contains both the Q4_K_XL version of Qwen3-30B-A3B-Instruct-2507 and Qwen3-30B-A3B-Thinking-2507, which has the following features:
NOTE: This model supports only non-thinking mode and does not generate <think></think> blocks in its output. Meanwhile, specifying enable_thinking=False is no longer required.
For more details, including benchmark evaluation, hardware requirements, and inference performance, please refer to our blog, GitHub, and Documentation.
To achieve optimal performance, we recommend the following settings:
Sampling Parameters:
Temperature=0.7, TopP=0.8, TopK=20, and MinP=0 for Instruct model and Temperature=0.6, TopP=0.95, TopK=20, and MinP=0 for Thinking Model .presence_penalty parameter between 0 and 2 to reduce endless repetitions. However, using a higher value may occasionally result in language mixing and a slight decrease in model performance.Adequate Output Length: We recommend using an output length of 16,384 tokens for most queries, which is adequate for instruct models.
Standardize Output Format: We recommend using prompts to standardize model outputs when benchmarking.
answer field with only the choice letter, e.g., "answer": "C".”If you find our work helpful, feel free to give us a cite.
@misc{qwen3technicalreport,
title={Qwen3 Technical Report},
author={Qwen Team},
year={2025},
eprint={2505.09388},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2505.09388},
}