176 1 week ago

GLM-4.6 is a hybrid reasoning model that provides two modes: a thinking mode for complex reasoning and tool use, and a non-thinking mode for immediate responses.

tools thinking

1 week ago

262c708de660 · 84GB ·

glm4moe
·
357B
·
IQ1_S
[gMASK]<sop> {{- if .Tools }}<|system|> # Tools You may call one or more functions to assist with th
{ "stop": [ "<|system|>", "<|user|>", "<|assistant|>" ] }

Readme

I AM NOT THE ORIGINAL AUTHOR – JUST UPLOADING A QUANTIZED OLLAMA VERSION BECAUSE IT WASN’T AVAILABLE YET 🤓

CREDITS TO ZHIPU AI, UNSLOTH, AND LLAMA.CPP, SEE BELOW.

GLM-4.6 (Quantized GGUF) – Attribution & Provenance

This is the quantized version of the GLM-4.6 model by Zhipu AI, specifically prepared for use with Ollama.
The quantized weights originate from Unsloth’s sharded GGUF release and were merged into one file using llama.cpp utilities.

Recommended Ollama version: >= v0.12.6-rc0

Disclaimer

The chat template has been reproduced to mirror the features described in the GLM-4.5 paper and in the original Jinja template. However, not all Jinja-template features can be faithfully reimplemented, because the current Ollama version does not provide equivalents for every syntax element. In particular, the model was trained to express tool calls with an XML-style envelope, for example:

<tool_call>{function-name}
<arg_key>{arg-key-1}</arg_key>
<arg_value>{arg-value-1}</arg_value>
<arg_key>{arg-key-2}</arg_key>
<arg_value>{arg-value-2}</arg_value>
...
</tool_call>

Ollama does not currently support this format. For compatibility, the chat template adopts a schema similar to Qwen3, returning tool calls as a JSON object:

<tool_call>
{"name": <function-name>, "arguments": <args-json-object>}
</tool_call>

As the tool-calling schema is not the one for which the model was optimized during training, behavior and performance may vary from the reference results.

Model Lineage and Credits

  1. Original Model:
    GLM-4.5-Air
    Created and released by Zhipu AI
    Official Hugging Face repository:
    https://huggingface.co/zai-org/GLM-4.6

  2. Quantized Version:
    Unsloth’s Quantized GGUF Release
    The quantized weights (sharded) are from Unsloth’s Hugging Face repository:
    https://huggingface.co/unsloth/GLM-4.6-GGUF

  3. Merge & Packaging:
    llama.cpp Tools
    The sharded GGUF files from Unsloth were merged into a single GGUF file using the official llama.cpp utilities (llama-gguf-split --merge).