Qwen3.5 optimized for low VRAM

tools thinking 9b

ollama run reecdev/qwen3.5-lowvram:9b

curl http://localhost:11434/api/chat \
  -d '{
    "model": "reecdev/qwen3.5-lowvram:9b",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

from ollama import chat

response = chat(
    model='reecdev/qwen3.5-lowvram:9b',
    messages=[{'role': 'user', 'content': 'Hello!'}],
)
print(response.message.content)

import ollama from 'ollama'

const response = await ollama.chat({
  model: 'reecdev/qwen3.5-lowvram:9b',
  messages: [{role: 'user', content: 'Hello!'}],
})
console.log(response.message.content)

Applications

Claude Code ollama launch claude --model reecdev/qwen3.5-lowvram:9b

Codex App ollama launch codex-app --model reecdev/qwen3.5-lowvram:9b

OpenClaw ollama launch openclaw --model reecdev/qwen3.5-lowvram:9b

Hermes Agent ollama launch hermes --model reecdev/qwen3.5-lowvram:9b

Codex ollama launch codex --model reecdev/qwen3.5-lowvram:9b

OpenCode ollama launch opencode --model reecdev/qwen3.5-lowvram:9b

Models

View all →

Name

1 model

Size / Usage

Context

Input

qwen3.5-lowvram:9b

4.8GB · 256K context window · Text · 2 months ago

qwen3.5-lowvram:9b

4.8GB

256K

Text

Readme

Qwen3.5-LowVRAM

Qwen3.5-LowVRAM is a version of Qwen3.5 9B optimized for GPUs with 6 GB of VRAM, cutting VRAM usage by about ~1.2 GB with near-zero quality loss.

Basic Usage

You can pull Qwen3.5-LowVRAM like this:

ollama pull reecdev/qwen3.5-lowvram:9b

and run it:

ollama run reecdev/qwen3.5-lowvram:9b

Tested Hardware

Qwen3.5-LowVRAM was tested on an NVIDIA GeForce RTX 3050 6 GB on various tests such as tool-use and coding. It averaged 14 tokens per second (vs. regular Qwen3.5 9B: 2 tokens per second with model offloading) and was able to complete these tasks successfully.

Notes

Qwen3.5-LowVRAM is only reccomended for GPUs that are unable to run the regular Qwen3.5-9B. If your GPU is fully capable of running the regular Qwen3.5-9B, you should use that instead.