ucx0204/ glm-4.6V-Flash-Q8

547 Downloads Updated 5 months ago

vision tools thinking

ollama run ucx0204/glm-4.6V-Flash-Q8

curl http://localhost:11434/api/chat \
  -d '{
    "model": "ucx0204/glm-4.6V-Flash-Q8",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

from ollama import chat

response = chat(
    model='ucx0204/glm-4.6V-Flash-Q8',
    messages=[{'role': 'user', 'content': 'Hello!'}],
)
print(response.message.content)

import ollama from 'ollama'

const response = await ollama.chat({
  model: 'ucx0204/glm-4.6V-Flash-Q8',
  messages: [{role: 'user', content: 'Hello!'}],
})
console.log(response.message.content)

Applications

Claude Code ollama launch claude --model ucx0204/glm-4.6V-Flash-Q8

Codex App ollama launch codex-app --model ucx0204/glm-4.6V-Flash-Q8

OpenClaw ollama launch openclaw --model ucx0204/glm-4.6V-Flash-Q8

Hermes Agent ollama launch hermes --model ucx0204/glm-4.6V-Flash-Q8

Codex ollama launch codex --model ucx0204/glm-4.6V-Flash-Q8

OpenCode ollama launch opencode --model ucx0204/glm-4.6V-Flash-Q8

Models

View all →

Name

1 model

Size / Usage

Context

Input

glm-4.6V-Flash-Q8:latest

12GB · 128K context window · Text, Image · 5 months ago

glm-4.6V-Flash-Q8:latest

12GB

128K

Text, Image

Readme

GLM-4.6V-Flash (Q8_0 GGUF)

This is a GGUF version of the GLM-4.6V-Flash model, quantized to Q8_0 (8-bit) for high-quality inference. It originates from Zhipu AI and was converted/quantized by Unsloth.

🚀 Model Details

Original Model: Zhipu AI GLM-4.6V-Flash
Quantization: Q8_0 (8-bit) - High quality, balanced memory usage.
Format: GGUF (Compatible with Ollama)
Capabilities: Multimodal (Vision & Text), Flash attention for speed.
Source: unsloth/GLM-4.6V-Flash-GGUF