second_constantine/qwen3-A3B-2507:thinking-30b

second_constantine/ qwen3-A3B-2507:thinking-30b

244 Downloads Updated 11 months ago

Qwen3-Thinking-2507 is the continuation of Qwen3 thinking model, with improved quality and depth of reasoning. Qwen3-Instruct-2507 is the updated version of the previous Qwen3 non-thinking mode. (quantized UD-Q4_K_XL, thinking and instruct versions)

tools thinking 30b

ollama run second_constantine/qwen3-A3B-2507:thinking-30b

curl http://localhost:11434/api/chat \
  -d '{
    "model": "second_constantine/qwen3-A3B-2507:thinking-30b",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

from ollama import chat

response = chat(
    model='second_constantine/qwen3-A3B-2507:thinking-30b',
    messages=[{'role': 'user', 'content': 'Hello!'}],
)
print(response.message.content)

import ollama from 'ollama'

const response = await ollama.chat({
  model: 'second_constantine/qwen3-A3B-2507:thinking-30b',
  messages: [{role: 'user', content: 'Hello!'}],
})
console.log(response.message.content)

Details

Updated 11 months ago

11 months ago

c140a12a8cca · 18GB ·

model

archqwen3moe

parameters30.5B

quantizationQ4_K_M

18GB

params

{ "min_p": 0, "presence_penalty": 1, "stop": [ "<|im_start|>", "<|im_end

181B

template

{{- $lastUserIdx := -1 -}} {{- range $idx, $msg := .Messages -}} {{- if eq $msg.Role "user" }}{{ $la

1.5kB

Readme

Thinking version based on the https://huggingface.co/unsloth/Qwen3-30B-A3B-Thinking-2507-GGUF (non-thinking mode not working!)
Instruct version based on the https://huggingface.co/unsloth/Qwen3-30B-A3B-Instruct-2507-GGUF

Feature	Value
vision	false
thinking	by version
tools	true

Device	Speed	Version
RTX 3090 24gb	~105 token/s	thinking
M1 Max 32gb	~51 token/s	thinking
RTX 3090 24gb	~107 token/s	non thinking
M1 Max 32gb	~53 token/s	non thinking