second_constantine/gigachat3.1:10b

second_constantine/ gigachat3.1:10b

55 Downloads Updated 6 days ago

Light version - GigaChat-3.1-Lightning: it shows the level of GPT-4o in arenas, but at the same time remains compact and fast (quantized Q4_K_M)

tools 10b

ollama run second_constantine/gigachat3.1:10b

curl http://localhost:11434/api/chat \
  -d '{
    "model": "second_constantine/gigachat3.1:10b",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

from ollama import chat

response = chat(
    model='second_constantine/gigachat3.1:10b',
    messages=[{'role': 'user', 'content': 'Hello!'}],
)
print(response.message.content)

import ollama from 'ollama'

const response = await ollama.chat({
  model: 'second_constantine/gigachat3.1:10b',
  messages: [{role: 'user', content: 'Hello!'}],
})
console.log(response.message.content)

Details

Updated 6 days ago

6 days ago

0cb27546b529 · 6.5GB ·

model

archdeepseek2

parameters10.7B

quantizationQ4_K_M

6.5GB

template

{{- if .Messages }} {{- if or .System .Tools }} {{- if .System }} {{ .System }} {{- end }} {{- if .T

1.3kB

params

{ "stop": [ "User:", "Assistant:" ] }

32B

Readme

Based on https://huggingface.co/ai-sage/GigaChat3.1-10B-A1.8B-GGUF (with deepseek Modelfile)

Feature	Value
vision	false
thinking	false
tools	true

Device	Speed, token/s	Context	Memory, gb	Versions
RTX 3090 24gb	~230	256k	21 (21.271)	Q4_K_M, 0.20.2
RTX 2080ti 11gb	~144	64k	10 (10.434)	Q4_K_M, 0.20.2
RTX 3070ti Mobile 8gb	~145	15k	7.5 (7.763)	Q4_K_M, 0.20.2
i7-12700H + 3070ti Mobile 8gb	~19	256k	22 (71%/29% CPU/GPU)	Q4_K_M, 0.20.2
M1 Max 32gb	~79	256k	21 (25.123)	Q4_K_M, 0.20.2