505 Downloads Updated 1 month ago
ollama run isotnek/qwen3.5:9B-Unsloth-UD-Q4_K_XL
This is an Unsloth quantization of Qwen3.5-9B. For a full list of other quants, the linked Unsloth HF repo is a great source. This specific quant was selected because of the analysis in this blog post, which found that this quant is a good balance of performance preservation and model size reduction.
This model is text-only because Ollama doesn’t yet support specifying mmproj files when creating Ollama Modelfiles with GGUFs. Still, this is a great model made better & faster by the good folks at Unsloth.
This model is set to reason by default. To disable reasoning in the Ollama CLI you can run:
ollama run isotnek/qwen3.5:9B-Unsloth-UD-Q4_K_XL
and then enter “/set nothink” in the chat window.
To run this (and other models in the linked Unsloth repo) for multimodal inference, I recommend instead using llama.cpp. To do so, download your desired model GGUF and mmproj-*.gguf files, and then:
brew install llama.cpp
llama-server \
-m ./Qwen3.5-9B-UD-Q4_K_XL.gguf \ # or your preferred model file, if different
--mmproj ./mmproj-BF16.gguf \ # or your preferred mmproj, if different
--host 0.0.0.0 \
--port 8080 \
-ngl 99 # enables the use of Apple's metal accelerator
and to run inference:
curl http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "qwen",
"messages": [{
"role": "user",
"content": [
{"type": "text", "text": "What is in this image?"},
{"type": "image_url", "image_url": {"url": "data:image/jpeg;base64,'$(base64 -i /Path/To/Your/Image.png)'"}}
]
}]
}'