mannix/
qwq-32b:latest

361 6 months ago

QwQ is an experimental research model focused on advancing AI reasoning capabilities. i-matrix quantizations.

tools

6 months ago

8a21d0911dd3 · 20GB

qwen2
·
32.8B
·
Q4_K_M
{{- if or .System .Tools }}<|im_start|>system {{- if .System }} {{ .System }} {{- end }} {{- if .Too
You are a helpful and harmless assistant. You are Qwen developed by Alibaba. You should think step-b
Apache License Version 2.0, January 2004
{ "stop": [ "<|im_start|>", "<|im_end|>" ] }

Readme

  • Quantization from fp32
  • All models using i-matrix calibration_datav3.txt
  • iq2_s and q6_k barely works properly

QwQ is a 32B parameter experimental research model developed by the Qwen Team, focused on advancing AI reasoning capabilities.

image.png

image.png

QwQ is a 32B parameter experimental research model developed by the Qwen Team, focused on advancing AI reasoning capabilities.

QwQ demonstrates remarkable performance across these benchmarks:

  • 65.2% on GPQA, showcasing its graduate-level scientific reasoning capabilities
  • 50.0% on AIME, highlighting its strong mathematical problem-solving skills
  • 90.6% on MATH-500, demonstrating exceptional mathematical comprehension across diverse topics
  • 50.0% on LiveCodeBench, validating its robust programming abilities in real-world scenarios.

These results underscore QwQ’s significant advancement in analytical and problem-solving capabilities, particularly in technical domains requiring deep reasoning.

As a preview release, it demonstrates promising analytical abilities while having several important limitations:

  • Language Mixing and Code-Switching: The model may mix languages or switch between them unexpectedly, affecting response clarity.
  • Recursive Reasoning Loops: The model may enter circular reasoning patterns, leading to lengthy responses without a conclusive answer.
  • Safety and Ethical Considerations: The model requires enhanced safety measures to ensure reliable and secure performance, and users should exercise caution when deploying it.
  • Performance and Benchmark Limitations: The model excels in math and coding but has room for improvement in other areas, such as common sense reasoning and nuanced language understanding.