226 1 week ago

Qwen3.5 optimized for low VRAM

tools thinking 9b
ollama run reecdev/qwen3.5-lowvram:9b

Applications

Claude Code
Claude Code ollama launch claude --model reecdev/qwen3.5-lowvram:9b
OpenClaw
OpenClaw ollama launch openclaw --model reecdev/qwen3.5-lowvram:9b
Hermes Agent
Hermes Agent ollama launch hermes --model reecdev/qwen3.5-lowvram:9b
Codex
Codex ollama launch codex --model reecdev/qwen3.5-lowvram:9b
OpenCode
OpenCode ollama launch opencode --model reecdev/qwen3.5-lowvram:9b

Models

View all →

Readme

Qwen3.5-LowVRAM

Qwen3.5-LowVRAM is a version of Qwen3.5 9B optimized for GPUs with 6 GB of VRAM, cutting VRAM usage by about ~1.2 GB with near-zero quality loss.

Basic Usage

You can pull Qwen3.5-LowVRAM like this:

ollama pull reecdev/qwen3.5-lowvram:9b

and run it:

ollama run reecdev/qwen3.5-lowvram:9b

Tested Hardware

Qwen3.5-LowVRAM was tested on an NVIDIA GeForce RTX 3050 6 GB on various tests such as tool-use and coding. It averaged 14 tokens per second (vs. regular Qwen3.5 9B: 2 tokens per second with model offloading) and was able to complete these tasks successfully.

Notes

Qwen3.5-LowVRAM is only reccomended for GPUs that are unable to run the regular Qwen3.5-9B. If your GPU is fully capable of running the regular Qwen3.5-9B, you should use that instead.