398 1 week ago

Qwen3.6-27B-MTP.GGUF model with multimodal vision projector support quantized at Q8.

vision
ollama run zhamm/qwen3.6:27b-mtp-q8-vision

Models

View all →

Readme

Qwen3.6 Vision Models

Custom Ollama imports of Qwen3.6 GGUF models with multimodal projector support.

This repository includes multiple Qwen3.6 vision-capable model variants, including the 27B dense model and the 35B A3B Mixture-of-Experts model.

Available Tags

27B Dense Vision Model

ollama run zhamm/qwen3.6:27b-mtp-q8-vision

35B A3B MOE Vision Model

ollama run zhamm/qwen3.6:35b-a3b-MOE-q8-vision

Model Details

27b-mtp-q8-vision

  • Base model: Qwen3.6-27B-MTP
  • Architecture: dense model
  • Parameters: approximately 27B
  • Quantization: Q8_0
  • Vision projector: multimodal projector included
  • Approximate total size: 30 GB
  • Context window: up to 256K
  • Input support: Text and Image

35b-a3b-MOE-q8-vision

  • Base model: Qwen3.6-35B-A3B-MOE
  • Architecture: Mixture-of-Experts
  • Parameters: approximately 35B
  • Active parameters: approximately 3B
  • Quantization: Q8_0
  • Vision projector: multimodal projector included
  • Approximate total size: 39 GB
  • Context window: up to 256K
  • Input support: Text and Image

Intended Use

These models are intended for local AI use with Ollama, including:

  • General chat
  • Reasoning
  • Coding assistance
  • Technical troubleshooting
  • Document analysis
  • Repository-level code review
  • Image-text tasks where supported by the Ollama runtime and client

The 27B dense model is a good general-purpose option when consistent dense-model behavior is preferred.

The 35B A3B MOE model is useful for experimentation with Mixture-of-Experts inference, where the full model has a larger parameter count but only a smaller subset of parameters are active for each token.

Recommended Hardware

Recommended GPU VRAM:

  • Minimum: 48 GB VRAM
  • Preferred: 64 GB+ VRAM
  • Ideal: 96 GB VRAM for larger context windows, image input, and experimentation

The 35B MOE model is larger on disk and may require more memory than the 27B dense model, especially when using long context windows.

Smaller GPUs may require reduced context length or may fall back to CPU/RAM offload, which can be much slower.

Suggested Settings

Recommended starting settings:

Context length: 32768
Temperature: 0.6
Top-p: 0.9

For coding, technical troubleshooting, and structured work, consider using a lower temperature such as:

Temperature: 0.2 - 0.4

For long-context use, Flash Attention and q8_0 KV cache are recommended where supported.

Very large context settings can significantly increase memory usage.

Usage

Run the 27B dense model:

ollama run zhamm/qwen3.6:27b-mtp-q8-vision

Run the 35B A3B MOE model:

ollama run zhamm/qwen3.6:35b-a3b-MOE-q8-vision

Example API call using the 27B model:

curl http://localhost:11434/api/chat \
  -d '{
    "model": "zhamm/qwen3.6:27b-mtp-q8-vision",
    "messages": [
      {
        "role": "user",
        "content": "Explain the difference between ARM and x86 CPUs."
      }
    ]
  }'

Example API call using the 35B MOE model:

curl http://localhost:11434/api/chat \
  -d '{
    "model": "zhamm/qwen3.6:35b-a3b-MOE-q8-vision",
    "messages": [
      {
        "role": "user",
        "content": "Explain the difference between dense and mixture-of-experts language models."
      }
    ]
  }'

Vision Usage

When using the Ollama API directly, image input should be passed as base64-encoded image data.

curl http://localhost:11434/api/chat \
  -d '{
    "model": "zhamm/qwen3.6:35b-a3b-MOE-q8-vision",
    "messages": [
      {
        "role": "user",
        "content": "Describe this image.",
        "images": ["BASE64_IMAGE_DATA_HERE"]
      }
    ]
  }'

Vision support depends on the Ollama runtime, the client being used, and how image input is passed to the model.

If image input does not work through a specific UI, test directly with the Ollama API or use a runtime with explicit multimodal projector support.

Notes

The 27b-mtp-q8-vision model includes MTP support in the source GGUF. Runtime support for MTP acceleration may vary depending on Ollama, llama.cpp, and client support.

The 35b-a3b-MOE-q8-vision model uses a Mixture-of-Experts architecture. The full model has approximately 35B parameters, but only a smaller subset of parameters are active for each token.

For best results with large context windows, use a system with sufficient VRAM and RAM.

These models are intended for experimentation, private local AI use, and technical workflows where larger local models are useful.

Attribution

Source models:

  • Qwen3.6-27B-MTP-GGUF
  • Qwen3.6-35B-A3B-MOE-GGUF

Please review and follow the upstream model licenses before redistribution or commercial use.