second_constantine/yandex-gpt-5-lite

second_constantine/ yandex-gpt-5-lite

330 Downloads Updated 5 months ago

Instruct version of the large language model YandexGPT 5 Lite with 8B parameters with a context length of 32k tokens. (quantised version of Q5_K_M)

ollama run second_constantine/yandex-gpt-5-lite:8b

curl http://localhost:11434/api/chat \
  -d '{
    "model": "second_constantine/yandex-gpt-5-lite:8b",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

from ollama import chat

response = chat(
    model='second_constantine/yandex-gpt-5-lite:8b',
    messages=[{'role': 'user', 'content': 'Hello!'}],
)
print(response.message.content)

import ollama from 'ollama'

const response = await ollama.chat({
  model: 'second_constantine/yandex-gpt-5-lite:8b',
  messages: [{role: 'user', content: 'Hello!'}],
})
console.log(response.message.content)

Models

View all →

Name

3 models

Size

Context

Input

yandex-gpt-5-lite:8b

5.7GB · 32K context window · Text · 5 months ago

yandex-gpt-5-lite:8b

5.7GB

32K

Text

Readme

Based on https://huggingface.co/mradermacher/YandexGPT-5-Lite-8B-instruct-GGUF

Feature	Value
vision	false
thinking	false
tools	false

Device	Speed, token/s	Context	VRAM, gb	Versions
RTX 3090 24gb	~105	4096	6.9	Q5_K_M,0.12.2
RTX 3090 24gb	~105	15360	9.2	Q5_K_M,0.12.2
RTX 2080ti 11gb	~74	4096	6.9	Q5_K_M,0.12.2
RTX 2080ti 11gb	~75	15360	9.2	Q5_K_M,0.12.2
M1 Max 32gb	~41	4096	6.6	Q5_K_M,0.12.2
M1 Max 32gb	~41	15360	8.2	Q5_K_M,0.12.2
RTX 3070ti Mobile 8gb	~60	4096	6.9	Q5_K_M, 0.12.3
RTX 3070ti Mobile 8gb	~23	15360	9.2 (14%/86% CPU/GPU)	Q5_K_M, 0.12.3