107 2 months ago

Instruct version of the large language model YandexGPT 5 Lite with 8B parameters with a context length of 32k tokens. (quantised version of Q5_K_M)

8b

Models

View all →

Readme

Based on https://huggingface.co/mradermacher/YandexGPT-5-Lite-8B-instruct-GGUF

Feature Value
vision false
thinking false
tools false
Device Speed, token/s Context VRAM, gb Versions
RTX 3090 24gb ~105 4096 6.9 Q5_K_M,0.12.2
RTX 3090 24gb ~105 15360 9.2 Q5_K_M,0.12.2
RTX 2080ti 11gb ~74 4096 6.9 Q5_K_M,0.12.2
RTX 2080ti 11gb ~75 15360 9.2 Q5_K_M,0.12.2
M1 Max 32gb ~41 4096 6.6 Q5_K_M,0.12.2
M1 Max 32gb ~41 15360 8.2 Q5_K_M,0.12.2
RTX 3070ti Mobile 8gb ~60 4096 6.9 Q5_K_M, 0.12.3
RTX 3070ti Mobile 8gb ~23 15360 9.2 (14%/86% CPU/GPU) Q5_K_M, 0.12.3