GLM-4.5-Air is a hybrid reasoning model that provides two modes: a thinking mode for complex reasoning and tool use, and a non-thinking mode for immediate responses.

I AM NOT THE ORIGINAL AUTHOR – JUST UPLOADING A QUANTIZED OLLAMA VERSION BECAUSE IT WASN’T AVAILABLE YET 🤓

CREDITS TO ZHIPU AI, UNSLOTH, AND LLAMA.CPP, SEE BELOW.

GLM-4.5-Air (Quantized GGUF) – Attribution & Provenance

This is the quantized version of the GLM-4.5-Air model by Zhipu AI, specifically prepared for use with Ollama.
The quantized weights originate from Unsloth’s sharded GGUF release and were merged into one file using llama.cpp utilities.

Recommended Ollama version: >= v0.11.5-rc2

Disclaimer

The chat template has been reproduced to mirror the features described in the GLM-4.5 paper and in the original Jinja template. However, not all Jinja-template features can be faithfully reimplemented, because the current Ollama version does not provide equivalents for every syntax element. In particular, the model was trained to express tool calls with an XML-style envelope, for example:

<tool_call>{function-name}
<arg_key>{arg-key-1}</arg_key>
<arg_value>{arg-value-1}</arg_value>
<arg_key>{arg-key-2}</arg_key>
<arg_value>{arg-value-2}</arg_value>
...
</tool_call>

Ollama does not currently support this format. For compatibility, the chat template adopts a schema similar to Qwen3, returning tool calls as a JSON object:

<tool_call>
{"name": <function-name>, "arguments": <args-json-object>}
</tool_call>

As the tool-calling schema is not the one for which the model was optimized during training, behavior and performance may vary from the reference results.

Model Lineage and Credits

Original Model:
GLM-4.5-Air
Created and released by Zhipu AI
Official Hugging Face repository:
https://huggingface.co/zai-org/GLM-4.5-Air
Quantized Version:
Unsloth’s Quantized GGUF Release
The quantized weights (sharded) are from Unsloth’s Hugging Face repository:
https://huggingface.co/unsloth/GLM-4.5-Air-GGUF
Merge & Packaging:
llama.cpp Tools
The sharded GGUF files from Unsloth were merged into a single GGUF file using the official llama.cpp utilities (llama-gguf-split --merge).

GLM-4.5-Air is a hybrid reasoning model that provides two modes: a thinking mode for complex reasoning and tool use, and a non-thinking mode for immediate responses.

Readme

I AM NOT THE ORIGINAL AUTHOR – JUST UPLOADING A QUANTIZED OLLAMA VERSION BECAUSE IT WASN’T AVAILABLE YET 🤓

GLM-4.5-Air (Quantized GGUF) – Attribution & Provenance

Disclaimer

Model Lineage and Credits