isotnek/qwen3.5:9B-Unsloth-UD-Q4_K_XL

isotnek/ qwen3.5:9B-Unsloth-UD-Q4_K_XL

505 Downloads Updated 1 month ago

TEXT-ONLY Unsloth Quantization of Qwen3.5:9B

tools thinking

ollama run isotnek/qwen3.5:9B-Unsloth-UD-Q4_K_XL

curl http://localhost:11434/api/chat \
  -d '{
    "model": "isotnek/qwen3.5:9B-Unsloth-UD-Q4_K_XL",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

from ollama import chat

response = chat(
    model='isotnek/qwen3.5:9B-Unsloth-UD-Q4_K_XL',
    messages=[{'role': 'user', 'content': 'Hello!'}],
)
print(response.message.content)

import ollama from 'ollama'

const response = await ollama.chat({
  model: 'isotnek/qwen3.5:9B-Unsloth-UD-Q4_K_XL',
  messages: [{role: 'user', content: 'Hello!'}],
})
console.log(response.message.content)

Details

Updated 1 month ago

1 month ago

ae3a55fdd011 · 6.0GB ·

model

archqwen35

·

parameters8.95B

·

quantizationQ4_K_M

6.0GB

params

{ "presence_penalty": 1.5, "temperature": 1, "top_k": 20, "top_p": 0.95 }

65B

template

{{ .Prompt }}

13B

Readme

This is an Unsloth quantization of Qwen3.5-9B. For a full list of other quants, the linked Unsloth HF repo is a great source. This specific quant was selected because of the analysis in this blog post, which found that this quant is a good balance of performance preservation and model size reduction.

This model is text-only because Ollama doesn’t yet support specifying mmproj files when creating Ollama Modelfiles with GGUFs. Still, this is a great model made better & faster by the good folks at Unsloth.

This model is set to reason by default. To disable reasoning in the Ollama CLI you can run:

ollama run isotnek/qwen3.5:9B-Unsloth-UD-Q4_K_XL

and then enter “/set nothink” in the chat window.

To run this (and other models in the linked Unsloth repo) for multimodal inference, I recommend instead using llama.cpp. To do so, download your desired model GGUF and mmproj-*.gguf files, and then:

brew install llama.cpp

llama-server \
  -m ./Qwen3.5-9B-UD-Q4_K_XL.gguf \ # or your preferred model file, if different
  --mmproj ./mmproj-BF16.gguf \ # or your preferred mmproj, if different
  --host 0.0.0.0 \
  --port 8080 \
  -ngl 99 # enables the use of Apple's metal accelerator

and to run inference:

curl http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "qwen",
    "messages": [{
      "role": "user",
      "content": [
        {"type": "text", "text": "What is in this image?"},
        {"type": "image_url", "image_url": {"url": "data:image/jpeg;base64,'$(base64 -i /Path/To/Your/Image.png)'"}}
      ]
    }]
  }'

This is an [Unsloth quantization](https://huggingface.co/unsloth/Qwen3.5-9B-GGUF/tree/main) of Qwen3.5-9B. For a full list of other quants, the linked Unsloth HF repo is a great source. This specific quant was selected because of the analysis in this [blog post](https://kaitchup.substack.com/p/summary-of-qwen35-gguf-evaluations), which found that this quant is a good balance of performance preservation and model size reduction.

This model is **text-only** because Ollama doesn't yet support specifying mmproj files when creating Ollama Modelfiles with GGUFs. Still, this is a great model made better & faster by the good folks at Unsloth.

This model is set to reason by default. To disable reasoning in the Ollama CLI you can run:
```bash
ollama run isotnek/qwen3.5:9B-Unsloth-UD-Q4_K_XL
```
and then enter "/set nothink" in the chat window.

To run this (and other models in the linked Unsloth repo) for multimodal inference, I recommend instead using llama.cpp. To do so, download your desired model GGUF and mmproj-*.gguf files, and then:
```bash
brew install llama.cpp

llama-server \
  -m ./Qwen3.5-9B-UD-Q4_K_XL.gguf \ # or your preferred model file, if different
  --mmproj ./mmproj-BF16.gguf \ # or your preferred mmproj, if different
  --host 0.0.0.0 \
  --port 8080 \
  -ngl 99 # enables the use of Apple's metal accelerator
```
and to run inference:
```bash
curl http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "qwen",
    "messages": [{
      "role": "user",
      "content": [
        {"type": "text", "text": "What is in this image?"},
        {"type": "image_url", "image_url": {"url": "data:image/jpeg;base64,'$(base64 -i /Path/To/Your/Image.png)'"}}
      ]
    }]
  }'
```

Paste, drop or click to upload images (.png, .jpeg, .jpg, .svg, .gif)