227 Downloads Updated 1 week ago
ollama run reecdev/qwen3.5-lowvram:9b
Qwen3.5-LowVRAM is a version of Qwen3.5 9B optimized for GPUs with 6 GB of VRAM, cutting VRAM usage by about ~1.2 GB with near-zero quality loss.
You can pull Qwen3.5-LowVRAM like this:
ollama pull reecdev/qwen3.5-lowvram:9b
and run it:
ollama run reecdev/qwen3.5-lowvram:9b
Qwen3.5-LowVRAM was tested on an NVIDIA GeForce RTX 3050 6 GB on various tests such as tool-use and coding. It averaged 14 tokens per second (vs. regular Qwen3.5 9B: 2 tokens per second with model offloading) and was able to complete these tasks successfully.
Qwen3.5-LowVRAM is only reccomended for GPUs that are unable to run the regular Qwen3.5-9B. If your GPU is fully capable of running the regular Qwen3.5-9B, you should use that instead.