82 3 days ago

Fine-tuned Qwen3.5-9B with distilled reasoning and full vision support. 883 tensors (427 text + 441 vision + 15 MTP) — vision tower preserved byte-for-byte from base via llama-export-lora merge.

vision tools thinking
ollama run robit/qwen3.5-9b-r7-research-vision:q4km

Details

3 days ago

7713708e5839 · 6.3GB ·

qwen35
·
9.65B
·
Q4_K_M
{ "num_ctx": 262144, "stop": [ "<|im_end|>" ], "temperature": 0.6, "top_

Readme

r7_research_vision_nutrition_label.png


Qwen3.5-9B R7 Research Vision (Q4_K_M)

Fine-tuned Qwen3.5-9B with distilled reasoning and full vision support. 883 tensors (427 text + 441 vision + 15 MTP) — vision tower preserved byte-for-byte from base via llama-export-lora merge.

Capabilities

  • Vision — image understanding (reads text, describes scenes, answers visual questions)
  • Thinking — structured reasoning in <think> blocks
  • Tool calling — structured tool_calls via Ollama /api/chat
  • Instruction following — concise answers, format constraints, system prompt adherence

Eval Results

Benchmark Score
Diverse stochastic eval (38 tests) 86.8%
Vision probe (rendered text) PASS (reads “42” from image)
Tool calling PASS (structured tool_calls)
Thinking PASS (produces thinking field)

Training

Quickstart

ollama run robit/qwen3.5-9b-r7-research-vision:q4km

Image chat

IMG64=$(base64 -w0 path/to/image.jpg)
curl -s http://localhost:11434/api/chat \
  -d '{"model":"robit/qwen3.5-9b-r7-research-vision:q4km","messages":[{"role":"user","content":"Describe this image.","images":["'"$IMG64"'"]}]}'

Parameters

  • RENDERER qwen3.5 + PARSER qwen3.5 (enables tool calling + vision)
  • num_ctx 262144 (max context)
  • temperature 0.6, top_p 0.95
  • stop "<|im_end|>"

License

Derived from Qwen3.5-9B (Apache 2.0). Training data licenses vary by source.