iliafed/qwen3.6turboquant

iliafed/ qwen3.6turboquant

109 Downloads Updated 2 weeks ago

vision tools thinking

ollama run iliafed/qwen3.6turboquant

curl http://localhost:11434/api/chat \
  -d '{
    "model": "iliafed/qwen3.6turboquant",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

from ollama import chat

response = chat(
    model='iliafed/qwen3.6turboquant',
    messages=[{'role': 'user', 'content': 'Hello!'}],
)
print(response.message.content)

import ollama from 'ollama'

const response = await ollama.chat({
  model: 'iliafed/qwen3.6turboquant',
  messages: [{role: 'user', content: 'Hello!'}],
})
console.log(response.message.content)

Applications

Claude Code

Claude Code ollama launch claude --model iliafed/qwen3.6turboquant

Codex App

Codex App ollama launch codex-app --model iliafed/qwen3.6turboquant

OpenClaw

OpenClaw ollama launch openclaw --model iliafed/qwen3.6turboquant

Hermes Agent

Hermes Agent ollama launch hermes --model iliafed/qwen3.6turboquant

Codex

Codex ollama launch codex --model iliafed/qwen3.6turboquant

OpenCode

OpenCode ollama launch opencode --model iliafed/qwen3.6turboquant

Models

Name

1 model

Size / Usage

Context

Input

qwen3.6turboquant:latest

24GB · 256K context window · Text, Image · 2 weeks ago

qwen3.6turboquant:latest

24GB

256K

Text, Image

Readme

# qwen3.6turboquant

Qwen 3.6 35B / 36B MoE for Ollama.

This model is published as Q4_K_M weights and is intended to be used with Ollama TurboQuant KV cache enabled via `tbqp3/tbq3`.

## Pull

```bash
ollama pull iliafed/qwen3.6turboquant

Run With TurboQuant

PowerShell:

$env:OLLAMA_FLASH_ATTENTION="1"
$env:OLLAMA_KV_CACHE_TYPE="tbqp3/tbq3"
$env:OLLAMA_CONTEXT_LENGTH="262144"

ollama run iliafed/qwen3.6turboquant

Linux/macOS:

OLLAMA_FLASH_ATTENTION=1 \
OLLAMA_KV_CACHE_TYPE=tbqp3/tbq3 \
OLLAMA_CONTEXT_LENGTH=262144 \
ollama run iliafed/qwen3.6turboquant

Model Details

Architecture: qwen35moe
Parameters: 36.0B
Weight quantization: Q4_K_M
Context length: 262144
Embedding length: 2048
Capabilities: completion, vision, tools, thinking
License: Apache 2.0

Important Note

tbqp3/tbq3 is an Ollama runtime KV-cache setting, not a model weight quantization format stored inside the model manifest.

The model weights are Q4_K_M. TurboQuant behavior is enabled by setting:

OLLAMA_FLASH_ATTENTION=1
OLLAMA_KV_CACHE_TYPE=tbqp3/tbq3

before running Ollama.

Ollama 0.24.0 does not accept:

ollama create --quantize tbq3
ollama create --quantize tbqp3

because those are not supported weight quantization targets. “`

```markdown
# qwen3.6turboquant

Qwen 3.6 35B / 36B MoE for Ollama.

This model is published as Q4_K_M weights and is intended to be used with Ollama TurboQuant KV cache enabled via `tbqp3/tbq3`.

## Pull

```bash
ollama pull iliafed/qwen3.6turboquant
```

## Run With TurboQuant

PowerShell:

```powershell
$env:OLLAMA_FLASH_ATTENTION="1"
$env:OLLAMA_KV_CACHE_TYPE="tbqp3/tbq3"
$env:OLLAMA_CONTEXT_LENGTH="262144"

ollama run iliafed/qwen3.6turboquant
```

Linux/macOS:

```bash
OLLAMA_FLASH_ATTENTION=1 \
OLLAMA_KV_CACHE_TYPE=tbqp3/tbq3 \
OLLAMA_CONTEXT_LENGTH=262144 \
ollama run iliafed/qwen3.6turboquant
```

## Model Details

- Architecture: `qwen35moe`
- Parameters: `36.0B`
- Weight quantization: `Q4_K_M`
- Context length: `262144`
- Embedding length: `2048`
- Capabilities: completion, vision, tools, thinking
- License: Apache 2.0

## Important Note

`tbqp3/tbq3` is an Ollama runtime KV-cache setting, not a model weight quantization format stored inside the model manifest.

The model weights are `Q4_K_M`. TurboQuant behavior is enabled by setting:

```bash
OLLAMA_FLASH_ATTENTION=1
OLLAMA_KV_CACHE_TYPE=tbqp3/tbq3
```

before running Ollama.

Ollama 0.24.0 does not accept:

```bash
ollama create --quantize tbq3
ollama create --quantize tbqp3
```

because those are not supported weight quantization targets.
```

Paste, drop or click to upload images (.png, .jpeg, .jpg, .svg, .gif)