176 Downloads Updated 1 week ago
CREDITS TO ZHIPU AI, UNSLOTH, AND LLAMA.CPP, SEE BELOW.
This is the quantized version of the GLM-4.6 model by Zhipu AI, specifically prepared for use with Ollama.
The quantized weights originate from Unsloth’s sharded GGUF release and were merged into one file using llama.cpp utilities.
Recommended Ollama version: >= v0.12.6-rc0
The chat template has been reproduced to mirror the features described in the GLM-4.5 paper and in the original Jinja template. However, not all Jinja-template features can be faithfully reimplemented, because the current Ollama version does not provide equivalents for every syntax element. In particular, the model was trained to express tool calls with an XML-style envelope, for example:
<tool_call>{function-name}
<arg_key>{arg-key-1}</arg_key>
<arg_value>{arg-value-1}</arg_value>
<arg_key>{arg-key-2}</arg_key>
<arg_value>{arg-value-2}</arg_value>
...
</tool_call>
Ollama does not currently support this format. For compatibility, the chat template adopts a schema similar to Qwen3, returning tool calls as a JSON object:
<tool_call>
{"name": <function-name>, "arguments": <args-json-object>}
</tool_call>
As the tool-calling schema is not the one for which the model was optimized during training, behavior and performance may vary from the reference results.
Original Model:
GLM-4.5-Air
Created and released by Zhipu AI
Official Hugging Face repository:
https://huggingface.co/zai-org/GLM-4.6
Quantized Version:
Unsloth’s Quantized GGUF Release
The quantized weights (sharded) are from Unsloth’s Hugging Face repository:
https://huggingface.co/unsloth/GLM-4.6-GGUF
Merge & Packaging:
llama.cpp Tools
The sharded GGUF files from Unsloth were merged into a single GGUF file using the official llama.cpp  utilities (llama-gguf-split --merge).