8 16 hours ago

一个可以在A5000显卡或4090上完整运行的QWen3.5大模型(上下文64k),具备调用工具的能力,适合本地部署龙虾和Hermes

tools thinking
ollama run jedi-knight/qwen3.5-27b-64k-tools:v1.0

Details

16 hours ago

2e5efbb56699 · 13GB ·

qwen35
·
26.9B
·
Q3_K_M
Comprehensive reasoning mode. If request is simple, then just answer. Otherwise, analyze the request
{ "num_ctx": 65536, "presence_penalty": 1.5, "repeat_last_n": 512, "repeat_penalty":
{{ .Prompt }}

Readme

Qwen3.5-27B 64K-Tools

A customized distribution of Qwen3.5-27B with three key modifications:

  1. Extended Context — 64K tokens (default 4K → 65,536)
  2. Tool Use Enabled — Native function calling via official Qwen3.5 renderer/parser
  3. 100% GPU on 24GB — Fits entirely on RTX 3090 / 4090 / A5000

Quick Start

ollama pull jedi-knight/qwen3.5-27b-64k-tools
ollama run jedi-knight/qwen3.5-27b-64k-tools

Hardware Requirements

GPU VRAM Status
RTX 3090 24 GB ✅ 100% GPU
RTX 4090 24 GB ✅ 100% GPU
RTX A5000 24 GB ✅ 100% GPU
RTX 4080 16 GB ❌ Requires CPU offload

Memory Breakdown

Component Size
Weights (Q3_K_M) ~16.5 GB
KV Cache (64K) ~4.5 GB
Total ~21 GB
VRAM Headroom ~3 GB

Model Details

  • Base Model: Qwen3.5-27B (Alibaba Cloud)
  • Quantization: Q3_K_M
  • Architecture: qwen35
  • Parameters: 26.9B
  • Max Context: 65,536
  • Capabilities: completion, tools, thinking

Comparison with Official Version

Feature Official qwen3.5:27b This Model
Quantization Q4_K_M Q3_K_M
Default Context 32,768 65,536
Total Size ~25 GB ~21 GB
GPU Load 84% GPU / 16% CPU 100% GPU
Tools Support

Build from Source

ollama create qwen3.5-27b-64k-tools -f Modelfile

License

This model is based on Qwen3.5-27B by Alibaba Cloud, licensed under Apache License 2.0. The Q3_K_M GGUF weights are derived from the community conversion by bartowski.