39 1 month ago

This is the continuation of Qwen3 thinking model (MOE), with improved quality and depth of reasoning. (quantized UD-Q4_K_XL, thinking without switching off)

tools thinking 30b

Models

View all →

Readme

Feature Value
vision false
thinking true (without switching off)
tools true
Device Speed, token/s Context VRAM, gb Versions
RTX 3090 24gb ~98 4096 18 UD-Q4_K_XL, 0.12.2
RTX 3090 24gb ~97 15360 20 UD-Q4_K_XL, 0.12.2
RTX 3090 24gb ~87 4096 17 IQ4_XS, 0.12.3
RTX 3090 24gb ~84 15360 18 IQ4_XS, 0.12.3
M1 Max 32gb ~49 4096 18 UD-Q4_K_XL, 0.12.2
M1 Max 32gb ~46 15360 18 UD-Q4_K_XL, 0.12.2