8,232 Downloads Updated 3 months ago
Name
9 models
GLM-4.5-Air:latest
73GB · 128K context window · Text · 3 months ago
GLM-4.5-Air:Q2_K
45GB · 128K context window · Text · 3 months ago
GLM-4.5-Air:Q3_K_M
57GB · 128K context window · Text · 3 months ago
GLM-4.5-Air:Q4_K_M
73GB · 128K context window · Text · 3 months ago
GLM-4.5-Air:Q5_K_M
83GB · 128K context window · Text · 3 months ago
GLM-4.5-Air:Q6_K
99GB · 128K context window · Text · 3 months ago
GLM-4.5-Air:Q8_0
117GB · 128K context window · Text · 3 months ago
GLM-4.5-Air:IQ1_M
40GB · 128K context window · Text · 3 months ago
GLM-4.5-Air:BF16
221GB · 128K context window · Text · 3 months ago
CREDITS TO ZHIPU AI, UNSLOTH, AND LLAMA.CPP, SEE BELOW.
This is the quantized version of the GLM-4.5-Air model by Zhipu AI, specifically prepared for use with Ollama.
The quantized weights originate from Unsloth’s sharded GGUF release and were merged into one file using llama.cpp utilities.
Recommended Ollama version: >= v0.11.5-rc2
The chat template has been reproduced to mirror the features described in the GLM-4.5 paper and in the original Jinja template. However, not all Jinja-template features can be faithfully reimplemented, because the current Ollama version does not provide equivalents for every syntax element. In particular, the model was trained to express tool calls with an XML-style envelope, for example:
<tool_call>{function-name}
<arg_key>{arg-key-1}</arg_key>
<arg_value>{arg-value-1}</arg_value>
<arg_key>{arg-key-2}</arg_key>
<arg_value>{arg-value-2}</arg_value>
...
</tool_call>
Ollama does not currently support this format. For compatibility, the chat template adopts a schema similar to Qwen3, returning tool calls as a JSON object:
<tool_call>
{"name": <function-name>, "arguments": <args-json-object>}
</tool_call>
As the tool-calling schema is not the one for which the model was optimized during training, behavior and performance may vary from the reference results.
Original Model:
GLM-4.5-Air
Created and released by Zhipu AI
Official Hugging Face repository:
https://huggingface.co/zai-org/GLM-4.5-Air
Quantized Version:
Unsloth’s Quantized GGUF Release
The quantized weights (sharded) are from Unsloth’s Hugging Face repository:
https://huggingface.co/unsloth/GLM-4.5-Air-GGUF
Merge & Packaging:
llama.cpp Tools
The sharded GGUF files from Unsloth were merged into a single GGUF file using the official llama.cpp utilities (llama-gguf-split --merge).