iliafed/qwen3.6turboquant

Details

Updated 2 weeks ago

2 weeks ago

359129bca53a · 24GB ·

model

archqwen35moe

parameters36B

quantizationQ4_K_M

24GB

license

Apache License Version 2.0, January 2004 http://www.apache.org/licenses/ TERMS AND CONDITIONS FOR US

12kB

params

{ "min_p": 0, "presence_penalty": 1.5, "repeat_penalty": 1, "temperature": 1, "t

94B

license

Apache License Version 2.0, January 2004 http://www.apache.org/licenses/ TERMS AND CONDITIONS FOR US

11kB

template

13B

# qwen3.6turboquant

Qwen 3.6 35B / 36B MoE for Ollama.

This model is published as Q4_K_M weights and is intended to be used with Ollama TurboQuant KV cache enabled via `tbqp3/tbq3`.

## Pull

```bash
ollama pull iliafed/qwen3.6turboquant

Run With TurboQuant

PowerShell:

$env:OLLAMA_FLASH_ATTENTION="1"
$env:OLLAMA_KV_CACHE_TYPE="tbqp3/tbq3"
$env:OLLAMA_CONTEXT_LENGTH="262144"

ollama run iliafed/qwen3.6turboquant

Linux/macOS:

OLLAMA_FLASH_ATTENTION=1 \
OLLAMA_KV_CACHE_TYPE=tbqp3/tbq3 \
OLLAMA_CONTEXT_LENGTH=262144 \
ollama run iliafed/qwen3.6turboquant

Model Details

Architecture: qwen35moe
Parameters: 36.0B
Weight quantization: Q4_K_M
Context length: 262144
Embedding length: 2048
Capabilities: completion, vision, tools, thinking
License: Apache 2.0

Important Note

tbqp3/tbq3 is an Ollama runtime KV-cache setting, not a model weight quantization format stored inside the model manifest.

The model weights are Q4_K_M. TurboQuant behavior is enabled by setting:

OLLAMA_FLASH_ATTENTION=1
OLLAMA_KV_CACHE_TYPE=tbqp3/tbq3

before running Ollama.

Ollama 0.24.0 does not accept:

ollama create --quantize tbq3
ollama create --quantize tbqp3

because those are not supported weight quantization targets. “`

Details

Readme

Run With TurboQuant

Model Details

Important Note