second_constantine/t-lite-it-2.1

second_constantine/

t-lite-it-2.1

21 Downloads Updated 3 days ago

T-lite-it-2.1 is an efficient Russian model built upon the Qwen 3 architecture, featuring significant improvements in instruction following and adds support for tool-calling capabilities (quantized Q4_K_M)

tools thinking 8b

Models

Name

3 models

Size

Context

Input

t-lite-it-2.1:8b

5.0GB · 40K context window · Text · 3 days ago

t-lite-it-2.1:8b

5.0GB

40K

Text

Readme

Based on https://huggingface.co/t-tech/T-lite-it-2.1-GGUF

Release https://habr.com/ru/companies/tbank/articles/979650/

Feature	Value
vision	false
thinking	true
tools	true

Device	Speed, token/s	Context	VRAM, gb	Versions
RTX 3090 24gb	~117	4096	6.6	Q5_K_M, 0.13.3
RTX 3090 24gb	~117	15360	8.3	Q5_K_M, 0.13.3
RTX 3090 24gb	~119	4096	5.8	Q4_K_M, 0.13.4
RTX 3090 24gb	~129	15360	7.5	Q4_K_M, 0.13.4
RTX 2080ti 11gb	~77	4096	6.6	Q5_K_M, 0.13.3
RTX 2080ti 11gb	~77	15360	8.3	Q5_K_M, 0.13.3
RTX 2080ti 11gb	~84	4096	5.8	Q4_K_M, 0.13.4
RTX 2080ti 11gb	~84	15360	7.5	Q4_K_M, 0.13.4
RTX 3070ti Mobile 8gb	~65	4096	6.6	Q5_K_M, 0.13.3
RTX 3070ti Mobile 8gb	~37	15360	8.3 (11%/89% CPU/GPU)	Q5_K_M, 0.13.3
RTX 3070ti Mobile 8gb	~71	4096	5.8	Q4_K_M, 0.13.4
RTX 3070ti Mobile 8gb	~71	15360	7.5	Q4_K_M, 0.13.4
M1 Max 32gb	~36	4096	6.3	Q5_K_M, 0.13.3
M1 Max 32gb	~37	15360	7.2	Q5_K_M, 0.13.3
M1 Max 32gb	~36	4096	5.5	Q4_K_M, 0.13.4
M1 Max 32gb	~36	15360	6.3	Q4_K_M, 0.13.4