227 1 week ago

Qwen3.5 optimized for low VRAM

tools thinking 9b
ollama run reecdev/qwen3.5-lowvram:9b

Details

1 week ago

4cde4559a80a · 4.8GB ·

qwen35
·
8.95B
·
Q3_K_L
{ "presence_penalty": 1.5, "temperature": 1, "top_k": 20, "top_p": 0.95 }
{{ .Prompt }}

Readme

Qwen3.5-LowVRAM

Qwen3.5-LowVRAM is a version of Qwen3.5 9B optimized for GPUs with 6 GB of VRAM, cutting VRAM usage by about ~1.2 GB with near-zero quality loss.

Basic Usage

You can pull Qwen3.5-LowVRAM like this:

ollama pull reecdev/qwen3.5-lowvram:9b

and run it:

ollama run reecdev/qwen3.5-lowvram:9b

Tested Hardware

Qwen3.5-LowVRAM was tested on an NVIDIA GeForce RTX 3050 6 GB on various tests such as tool-use and coding. It averaged 14 tokens per second (vs. regular Qwen3.5 9B: 2 tokens per second with model offloading) and was able to complete these tasks successfully.

Notes

Qwen3.5-LowVRAM is only reccomended for GPUs that are unable to run the regular Qwen3.5-9B. If your GPU is fully capable of running the regular Qwen3.5-9B, you should use that instead.