109 2 weeks ago

vision tools thinking
ollama run iliafed/qwen3.6turboquant

Details

2 weeks ago

359129bca53a · 24GB ·

qwen35moe
·
36B
·
Q4_K_M
Apache License Version 2.0, January 2004 http://www.apache.org/licenses/ TERMS AND CONDITIONS FOR US
{ "min_p": 0, "presence_penalty": 1.5, "repeat_penalty": 1, "temperature": 1, "t
Apache License Version 2.0, January 2004 http://www.apache.org/licenses/ TERMS AND CONDITIONS FOR US
{{ .Prompt }}

Readme

# qwen3.6turboquant

Qwen 3.6 35B / 36B MoE for Ollama.

This model is published as Q4_K_M weights and is intended to be used with Ollama TurboQuant KV cache enabled via `tbqp3/tbq3`.

## Pull

```bash
ollama pull iliafed/qwen3.6turboquant

Run With TurboQuant

PowerShell:

$env:OLLAMA_FLASH_ATTENTION="1"
$env:OLLAMA_KV_CACHE_TYPE="tbqp3/tbq3"
$env:OLLAMA_CONTEXT_LENGTH="262144"

ollama run iliafed/qwen3.6turboquant

Linux/macOS:

OLLAMA_FLASH_ATTENTION=1 \
OLLAMA_KV_CACHE_TYPE=tbqp3/tbq3 \
OLLAMA_CONTEXT_LENGTH=262144 \
ollama run iliafed/qwen3.6turboquant

Model Details

  • Architecture: qwen35moe
  • Parameters: 36.0B
  • Weight quantization: Q4_K_M
  • Context length: 262144
  • Embedding length: 2048
  • Capabilities: completion, vision, tools, thinking
  • License: Apache 2.0

Important Note

tbqp3/tbq3 is an Ollama runtime KV-cache setting, not a model weight quantization format stored inside the model manifest.

The model weights are Q4_K_M. TurboQuant behavior is enabled by setting:

OLLAMA_FLASH_ATTENTION=1
OLLAMA_KV_CACHE_TYPE=tbqp3/tbq3

before running Ollama.

Ollama 0.24.0 does not accept:

ollama create --quantize tbq3
ollama create --quantize tbqp3

because those are not supported weight quantization targets. “`