stewartpark/ qwen3.5:27b-q8_0

204 yesterday

Qwen3.5 models (27B dense + 122B MoE) with always-on thinking, optimized for agentic coding, tool use, and browser automation.

vision tools thinking 27b 122b
ollama run stewartpark/qwen3.5:27b-q8_0

Details

yesterday

6960f7794400 · 30GB ·

qwen35
·
27.8B
·
Q8_0
Apache License Version 2.0, January 2004 http://www.apache.org/licenses/ TERMS AND CONDITIONS FOR US
{{- if .Messages }} {{- if or .System .Tools }}<|im_start|>system {{- if .System }} {{ .System }} {{
You are a helpful AI assistant. Core behaviors: - Think step-by-step inside <think> tags before resp
{ "min_p": 0, "num_ctx": 262144, "num_predict": 32768, "stop": [ "<|im_end|>

Readme

This is a customized Qwen3.5 collection designed for agentic workflows, available in three configurations:

  • qwen3.5:27b (default) — 27B dense parameters quantized to Q8_0 (~29GB). Native 256K context window with always-enabled thinking mode. Near-lossless quality at half the VRAM of BF16, accessible on 48GB GPUs like the RTX A6000 or dual-24GB setups.
  • qwen3.5:27b-bf16 — Same 27B dense model in full BF16 precision (~54GB). Better for vision and browser automation workloads where full precision matters. Requires high-VRAM GPUs like the RTX PRO 6000 96GB.
  • qwen3.5:122b — 122B Mixture-of-Experts architecture with 256 experts and 10B active parameters. Same 256K context and thinking capabilities, engineered for advanced reasoning where GPU memory permits.

All variants use precision sampling parameters (temperature 0.6, top_p 0.95, top_k 20) and support up to 32K output tokens. The models handle multi-step tool calling, autonomous browser operations, and sophisticated coding assignments.