109 2 weeks ago

vision tools thinking
ollama run iliafed/qwen3.6turboquant

Applications

Claude Code
Claude Code ollama launch claude --model iliafed/qwen3.6turboquant
Codex App
Codex App ollama launch codex-app --model iliafed/qwen3.6turboquant
OpenClaw
OpenClaw ollama launch openclaw --model iliafed/qwen3.6turboquant
Hermes Agent
Hermes Agent ollama launch hermes --model iliafed/qwen3.6turboquant
Codex
Codex ollama launch codex --model iliafed/qwen3.6turboquant
OpenCode
OpenCode ollama launch opencode --model iliafed/qwen3.6turboquant

Models

View all →

Readme

# qwen3.6turboquant

Qwen 3.6 35B / 36B MoE for Ollama.

This model is published as Q4_K_M weights and is intended to be used with Ollama TurboQuant KV cache enabled via `tbqp3/tbq3`.

## Pull

```bash
ollama pull iliafed/qwen3.6turboquant

Run With TurboQuant

PowerShell:

$env:OLLAMA_FLASH_ATTENTION="1"
$env:OLLAMA_KV_CACHE_TYPE="tbqp3/tbq3"
$env:OLLAMA_CONTEXT_LENGTH="262144"

ollama run iliafed/qwen3.6turboquant

Linux/macOS:

OLLAMA_FLASH_ATTENTION=1 \
OLLAMA_KV_CACHE_TYPE=tbqp3/tbq3 \
OLLAMA_CONTEXT_LENGTH=262144 \
ollama run iliafed/qwen3.6turboquant

Model Details

  • Architecture: qwen35moe
  • Parameters: 36.0B
  • Weight quantization: Q4_K_M
  • Context length: 262144
  • Embedding length: 2048
  • Capabilities: completion, vision, tools, thinking
  • License: Apache 2.0

Important Note

tbqp3/tbq3 is an Ollama runtime KV-cache setting, not a model weight quantization format stored inside the model manifest.

The model weights are Q4_K_M. TurboQuant behavior is enabled by setting:

OLLAMA_FLASH_ATTENTION=1
OLLAMA_KV_CACHE_TYPE=tbqp3/tbq3

before running Ollama.

Ollama 0.24.0 does not accept:

ollama create --quantize tbq3
ollama create --quantize tbqp3

because those are not supported weight quantization targets. “`