second_constantine/yandex-gpt-5-lite:8b-Q4_K

second_constantine/

yandex-gpt-5-lite:8b-Q4_K_M

57 Downloads Updated 3 weeks ago

Instruct version of the large language model YandexGPT 5 Lite with 8B parameters with a context length of 32k tokens. (quantised version of Q5_K_M)

Updated 3 weeks ago

3 weeks ago

4788a8871969 · 4.9GB ·

archllama

parameters8.04B

quantizationQ4_K_M

4.9GB

Лицензионное соглашение YandexGPT-5-Lite-8B Настоящее лицензи�

23kB

<s> Ассистент:[SEP]{{- range .Messages }}{{- if eq .Role "user" }}Response }} Пользо�

211B

{ "stop": [ "<s>", "[SEP]", "Response }}\n\n Пользователь:"

68B

Device	Speed, token/s	Context	VRAM, gb	Versions
RTX 3090 24gb	~105	4096	6.9	Q5_K_M,0.12.2
RTX 3090 24gb	~105	15360	9.2	Q5_K_M,0.12.2
RTX 2080ti 11gb	~74	4096	6.9	Q5_K_M,0.12.2
RTX 2080ti 11gb	~75	15360	9.2	Q5_K_M,0.12.2
M1 Max 32gb	~41	4096	6.6	Q5_K_M,0.12.2
M1 Max 32gb	~41	15360	8.2	Q5_K_M,0.12.2