39 1 month ago

This is the continuation of Qwen3 thinking model (MOE), with improved quality and depth of reasoning. (quantized UD-Q4_K_XL, thinking without switching off)

tools thinking 30b

1 month ago

c140a12a8cca · 18GB ·

qwen3moe
·
30.5B
·
Q4_K_M
{ "min_p": 0, "presence_penalty": 1, "stop": [ "<|im_start|>", "<|im_end
{{- $lastUserIdx := -1 -}} {{- range $idx, $msg := .Messages -}} {{- if eq $msg.Role "user" }}{{ $la

Readme

Feature Value
vision false
thinking true (without switching off)
tools true
Device Speed, token/s Context VRAM, gb Versions
RTX 3090 24gb ~98 4096 18 UD-Q4_K_XL, 0.12.2
RTX 3090 24gb ~97 15360 20 UD-Q4_K_XL, 0.12.2
RTX 3090 24gb ~87 4096 17 IQ4_XS, 0.12.3
RTX 3090 24gb ~84 15360 18 IQ4_XS, 0.12.3
M1 Max 32gb ~49 4096 18 UD-Q4_K_XL, 0.12.2
M1 Max 32gb ~46 15360 18 UD-Q4_K_XL, 0.12.2