stewartpark/qwen3.5:27b-q8

stewartpark/ qwen3.5:27b-q8_0

1,368 Downloads Updated 2 months ago

Qwen3.5 model (27B dense) with always-on thinking, optimized for agentic coding, tool use, and browser automation.

vision tools thinking 27b

ollama run stewartpark/qwen3.5:27b-q8_0

curl http://localhost:11434/api/chat \
  -d '{
    "model": "stewartpark/qwen3.5:27b-q8_0",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

from ollama import chat

response = chat(
    model='stewartpark/qwen3.5:27b-q8_0',
    messages=[{'role': 'user', 'content': 'Hello!'}],
)
print(response.message.content)

import ollama from 'ollama'

const response = await ollama.chat({
  model: 'stewartpark/qwen3.5:27b-q8_0',
  messages: [{role: 'user', content: 'Hello!'}],
})
console.log(response.message.content)

Details

Updated 2 months ago

2 months ago

6960f7794400 · 30GB ·

model

archqwen35

parameters27.8B

quantizationQ8_0

30GB

license

Apache License Version 2.0, January 2004 http://www.apache.org/licenses/ TERMS AND CONDITIONS FOR US

11kB

template

{{- if .Messages }} {{- if or .System .Tools }}<|im_start|>system {{- if .System }} {{ .System }} {{

1.5kB

system

You are a helpful AI assistant. Core behaviors: - Think step-by-step inside <think> tags before resp

745B

params

{ "min_p": 0, "num_ctx": 262144, "num_predict": 32768, "stop": [ "<|im_end|>

149B

Readme

This is a customized Qwen3.5 collection designed for agentic workflows, available in three configurations:

qwen3.5:27b-bf16 (default) — 27B dense parameters in full BF16 precision (~56GB). Native 256K context window with always-enabled thinking mode. Maximum quality with no quantization loss. Requires 56GB+ VRAM (RTX A6000 + CPU offloading, A100/H100).
qwen3.5:27b-q8_0 — 27B dense parameters quantized to Q8_0 (~30GB). Native 256K context window with always-enabled thinking mode. Near-lossless quality at half the VRAM of BF16, accessible on 24GB GPUs (RTX 3090 / 4090) with minimal CPU offloading.

All variants use precision sampling parameters (temperature 0.6, top_p 0.95, top_k 20) and support up to 32K output tokens. The models handle multi-step tool calling, autonomous browser operations, and sophisticated coding assignments.