stewartpark/qwen3.5

stewartpark/ qwen3.5:latest

1,391 Downloads Updated 2 months ago

Qwen3.5 model (27B dense) with always-on thinking, optimized for agentic coding, tool use, and browser automation.

vision tools thinking 27b

ollama run stewartpark/qwen3.5

curl http://localhost:11434/api/chat \
  -d '{
    "model": "stewartpark/qwen3.5",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

from ollama import chat

response = chat(
    model='stewartpark/qwen3.5',
    messages=[{'role': 'user', 'content': 'Hello!'}],
)
print(response.message.content)

import ollama from 'ollama'

const response = await ollama.chat({
  model: 'stewartpark/qwen3.5',
  messages: [{role: 'user', content: 'Hello!'}],
})
console.log(response.message.content)

Details

Updated 2 months ago

2 months ago

01a10c7ef22d · 56GB ·

model

archqwen35

parameters27.8B

quantizationF16

56GB

license

Apache License Version 2.0, January 2004 http://www.apache.org/licenses/ TERMS AND CONDITIONS FOR US

11kB

params

{ "min_p": 0, "num_ctx": 262144, "num_predict": 65536, "presence_penalty": 1.5,

133B

Readme

This is a customized Qwen3.5 collection designed for agentic workflows, available in three configurations:

qwen3.5:27b-bf16 (default) — 27B dense parameters in full BF16 precision (~56GB). Native 256K context window with always-enabled thinking mode. Maximum quality with no quantization loss. Requires 56GB+ VRAM (RTX A6000 + CPU offloading, A100/H100).
qwen3.5:27b-q8_0 — 27B dense parameters quantized to Q8_0 (~30GB). Native 256K context window with always-enabled thinking mode. Near-lossless quality at half the VRAM of BF16, accessible on 24GB GPUs (RTX 3090 / 4090) with minimal CPU offloading.

All variants use precision sampling parameters (temperature 0.6, top_p 0.95, top_k 20) and support up to 32K output tokens. The models handle multi-step tool calling, autonomous browser operations, and sophisticated coding assignments.