398 Downloads Updated 1 week ago
ollama run zhamm/qwen3.6:27b-mtp-q8-vision
Custom Ollama imports of Qwen3.6 GGUF models with multimodal projector support.
This repository includes multiple Qwen3.6 vision-capable model variants, including the 27B dense model and the 35B A3B Mixture-of-Experts model.
ollama run zhamm/qwen3.6:27b-mtp-q8-vision
ollama run zhamm/qwen3.6:35b-a3b-MOE-q8-vision
27b-mtp-q8-visionQwen3.6-27B-MTPQ8_035b-a3b-MOE-q8-visionQwen3.6-35B-A3B-MOEQ8_0These models are intended for local AI use with Ollama, including:
The 27B dense model is a good general-purpose option when consistent dense-model behavior is preferred.
The 35B A3B MOE model is useful for experimentation with Mixture-of-Experts inference, where the full model has a larger parameter count but only a smaller subset of parameters are active for each token.
Recommended GPU VRAM:
The 35B MOE model is larger on disk and may require more memory than the 27B dense model, especially when using long context windows.
Smaller GPUs may require reduced context length or may fall back to CPU/RAM offload, which can be much slower.
Recommended starting settings:
Context length: 32768
Temperature: 0.6
Top-p: 0.9
For coding, technical troubleshooting, and structured work, consider using a lower temperature such as:
Temperature: 0.2 - 0.4
For long-context use, Flash Attention and q8_0 KV cache are recommended where supported.
Very large context settings can significantly increase memory usage.
Run the 27B dense model:
ollama run zhamm/qwen3.6:27b-mtp-q8-vision
Run the 35B A3B MOE model:
ollama run zhamm/qwen3.6:35b-a3b-MOE-q8-vision
Example API call using the 27B model:
curl http://localhost:11434/api/chat \
-d '{
"model": "zhamm/qwen3.6:27b-mtp-q8-vision",
"messages": [
{
"role": "user",
"content": "Explain the difference between ARM and x86 CPUs."
}
]
}'
Example API call using the 35B MOE model:
curl http://localhost:11434/api/chat \
-d '{
"model": "zhamm/qwen3.6:35b-a3b-MOE-q8-vision",
"messages": [
{
"role": "user",
"content": "Explain the difference between dense and mixture-of-experts language models."
}
]
}'
When using the Ollama API directly, image input should be passed as base64-encoded image data.
curl http://localhost:11434/api/chat \
-d '{
"model": "zhamm/qwen3.6:35b-a3b-MOE-q8-vision",
"messages": [
{
"role": "user",
"content": "Describe this image.",
"images": ["BASE64_IMAGE_DATA_HERE"]
}
]
}'
Vision support depends on the Ollama runtime, the client being used, and how image input is passed to the model.
If image input does not work through a specific UI, test directly with the Ollama API or use a runtime with explicit multimodal projector support.
The 27b-mtp-q8-vision model includes MTP support in the source GGUF. Runtime support for MTP acceleration may vary depending on Ollama, llama.cpp, and client support.
The 35b-a3b-MOE-q8-vision model uses a Mixture-of-Experts architecture. The full model has approximately 35B parameters, but only a smaller subset of parameters are active for each token.
For best results with large context windows, use a system with sufficient VRAM and RAM.
These models are intended for experimentation, private local AI use, and technical workflows where larger local models are useful.
Source models:
Qwen3.6-27B-MTP-GGUFQwen3.6-35B-A3B-MOE-GGUFPlease review and follow the upstream model licenses before redistribution or commercial use.