robit/ qwen3.5-9b-r7-research-vision:q4km

594 Downloads Updated 3 months ago

Fine-tuned Qwen3.5-9B with distilled reasoning and full vision support. 883 tensors (427 text + 441 vision + 15 MTP) — vision tower preserved byte-for-byte from base via llama-export-lora merge.

vision tools thinking

ollama run robit/qwen3.5-9b-r7-research-vision:q4km

curl http://localhost:11434/api/chat \
  -d '{
    "model": "robit/qwen3.5-9b-r7-research-vision:q4km",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

from ollama import chat

response = chat(
    model='robit/qwen3.5-9b-r7-research-vision:q4km',
    messages=[{'role': 'user', 'content': 'Hello!'}],
)
print(response.message.content)

import ollama from 'ollama'

const response = await ollama.chat({
  model: 'robit/qwen3.5-9b-r7-research-vision:q4km',
  messages: [{role: 'user', content: 'Hello!'}],
})
console.log(response.message.content)

Details

Updated 3 months ago

3 months ago

7713708e5839 · 6.3GB ·

model

archqwen35

·

parameters9.65B

·

quantizationQ4_K_M

6.3GB

params

{ "num_ctx": 262144, "stop": [ "<|im_end|>" ], "temperature": 0.6, "top_

82B

Readme

Qwen3.5-9B R7 Research Vision (Q4_K_M)

Fine-tuned Qwen3.5-9B with distilled reasoning and full vision support. 883 tensors (427 text + 441 vision + 15 MTP) — vision tower preserved byte-for-byte from base via llama-export-lora merge.

Capabilities

Vision — image understanding (reads text, describes scenes, answers visual questions)
Thinking — structured reasoning in <think> blocks
Tool calling — structured tool_calls via Ollama /api/chat
Instruction following — concise answers, format constraints, system prompt adherence

Eval Results

Benchmark	Score
Diverse stochastic eval (38 tests)	86.8%
Vision probe (rendered text)	PASS (reads “42” from image)
Tool calling	PASS (structured tool_calls)
Thinking	PASS (produces thinking field)

Training

Base model: Qwen/Qwen3.5-9B
Method: LoRA SFT (r=32, alpha=64, LR=1e-4, 1 epoch), merged via llama-export-lora to preserve vision
Data: Additive mix of 4043 samples from:
- bespokelabs/Bespoke-Stratos-17k — DeepSeek-R1 reasoning traces
- allenai/tulu-3-sft-mixture — instruction diversity
- Open-Orca/SlimOrca — curated GPT-4 instructions
- PrimeIntellect/SYNTHETIC-1-SFT-Data — verified math/code/STEM
Vision preservation: LoRA filtered (linear_attn removed) -> convert_lora_to_gguf -> llama-export-lora into base Q4_K_M GGUF. All 441 vision tensors + 15 MTP tensors unchanged.
Training suite: robit-man/fine_tuning_suite

Quickstart

ollama run robit/qwen3.5-9b-r7-research-vision:q4km

Image chat

IMG64=$(base64 -w0 path/to/image.jpg)
curl -s http://localhost:11434/api/chat \
  -d '{"model":"robit/qwen3.5-9b-r7-research-vision:q4km","messages":[{"role":"user","content":"Describe this image.","images":["'"$IMG64"'"]}]}'

Parameters

RENDERER qwen3.5 + PARSER qwen3.5 (enables tool calling + vision)
num_ctx 262144 (max context)
temperature 0.6, top_p 0.95
stop "<|im_end|>"

License

Derived from Qwen3.5-9B (Apache 2.0). Training data licenses vary by source.

![r7_research_vision_nutrition_label.png](/assets/robit/qwen3.5-9b-r7-research-vision/7c6d5e17-b524-4a50-b769-afcf70258b3f)

---

# Qwen3.5-9B R7 Research Vision (Q4_K_M)

Fine-tuned Qwen3.5-9B with distilled reasoning and full vision support. 883 tensors (427 text + 441 vision + 15 MTP) — vision tower preserved byte-for-byte from base via `llama-export-lora` merge.

## Capabilities

- **Vision** — image understanding (reads text, describes scenes, answers visual questions)
- **Thinking** — structured reasoning in `<think>` blocks
- **Tool calling** — structured `tool_calls` via Ollama `/api/chat`
- **Instruction following** — concise answers, format constraints, system prompt adherence

## Eval Results

| Benchmark | Score |
|-----------|-------|
| Diverse stochastic eval (38 tests) | **86.8%** |
| Vision probe (rendered text) | **PASS** (reads "42" from image) |
| Tool calling | **PASS** (structured tool_calls) |
| Thinking | **PASS** (produces thinking field) |

## Training

- **Base model**: [Qwen/Qwen3.5-9B](https://huggingface.co/Qwen/Qwen3.5-9B)
- **Method**: LoRA SFT (r=32, alpha=64, LR=1e-4, 1 epoch), merged via `llama-export-lora` to preserve vision
- **Data**: Additive mix of 4043 samples from:
  - [bespokelabs/Bespoke-Stratos-17k](https://huggingface.co/datasets/bespokelabs/Bespoke-Stratos-17k) — DeepSeek-R1 reasoning traces
  - [allenai/tulu-3-sft-mixture](https://huggingface.co/datasets/allenai/tulu-3-sft-mixture) — instruction diversity
  - [Open-Orca/SlimOrca](https://huggingface.co/datasets/Open-Orca/SlimOrca) — curated GPT-4 instructions
  - [PrimeIntellect/SYNTHETIC-1-SFT-Data](https://huggingface.co/datasets/PrimeIntellect/SYNTHETIC-1-SFT-Data) — verified math/code/STEM
- **Vision preservation**: LoRA filtered (linear_attn removed) -> `convert_lora_to_gguf` -> `llama-export-lora` into base Q4_K_M GGUF. All 441 vision tensors + 15 MTP tensors unchanged.
- **Training suite**: [robit-man/fine_tuning_suite](https://github.com/robit-man/fine_tuning_suite)

## Quickstart

```bash
ollama run robit/qwen3.5-9b-r7-research-vision:q4km
```

### Image chat

```bash
IMG64=$(base64 -w0 path/to/image.jpg)
curl -s http://localhost:11434/api/chat \
  -d '{"model":"robit/qwen3.5-9b-r7-research-vision:q4km","messages":[{"role":"user","content":"Describe this image.","images":["'"$IMG64"'"]}]}'
```

## Parameters

- `RENDERER qwen3.5` + `PARSER qwen3.5` (enables tool calling + vision)
- `num_ctx 262144` (max context)
- `temperature 0.6`, `top_p 0.95`
- `stop "<|im_end|>"`

## License

Derived from Qwen3.5-9B (Apache 2.0). Training data licenses vary by source.

Paste, drop or click to upload images (.png, .jpeg, .jpg, .svg, .gif)