169 4 days ago

Qwen/Qwen3.6-27B + 3 Qwen3.6 fine-tunes with MLP-passthrough surgery - MTP quants

ollama run mannix/omnimerge-v4-mtp:IQ3_M

Details

4 days ago

7c218026b8e1 · 13GB ·

qwen35
·
27.3B
·
(!unknown_file_type 27!)

Readme

GGUF quantizations of ManniX-ITA/Qwen3.6-27B-Omnimerge-v4 with the MTP (Multi-Token Prediction) head retained for self-speculative decoding on llama.cpp mainline (PR #22673, merged 2026-05-16) and later.

Up to 4x inference speed with 1 session, 2x with 2 in parallel.

image.png