rhundt/GLM-4-0414-32b-128k-Q4_K

rhundt/

GLM-4-0414-32b-128k-Q4_K_M:latest

916 Downloads Updated 7 months ago

GLM-4-0414 32B with 128k context (YaRN RoPE scaling). Needs ollama 0.6.6

tools

Updated 7 months ago

7 months ago

67a2a027906e · 20GB ·

model

archglm4

parameters32.6B

quantizationQ4_K_M

20GB

template

[gMASK]<sop>{{- /* ---------- tools section ---------- */}} {{- if .Tools }} <|system|> # Available

1.0kB

params

{ "num_ctx": 64000, "stop": [ "<|system|>", "<|user|>", "<|assistant

115B

Readme

Quantized with YaRN RoPE scaling to 128k context (factor 4). This needs Ollama >=0.6.6 to run. The num_ctx in the Modelfile defaults to 64k just because I don’t have gobs of VRAM.