21 3 days ago

T-lite-it-2.1 is an efficient Russian model built upon the Qwen 3 architecture, featuring significant improvements in instruction following and adds support for tool-calling capabilities (quantized Q4_K_M)

tools thinking 8b

Models

View all →

Readme

Based on https://huggingface.co/t-tech/T-lite-it-2.1-GGUF

Release https://habr.com/ru/companies/tbank/articles/979650/

Feature Value
vision false
thinking true
tools true
Device Speed, token/s Context VRAM, gb Versions
RTX 3090 24gb ~117 4096 6.6 Q5_K_M, 0.13.3
RTX 3090 24gb ~117 15360 8.3 Q5_K_M, 0.13.3
RTX 3090 24gb ~119 4096 5.8 Q4_K_M, 0.13.4
RTX 3090 24gb ~129 15360 7.5 Q4_K_M, 0.13.4
RTX 2080ti 11gb ~77 4096 6.6 Q5_K_M, 0.13.3
RTX 2080ti 11gb ~77 15360 8.3 Q5_K_M, 0.13.3
RTX 2080ti 11gb ~84 4096 5.8 Q4_K_M, 0.13.4
RTX 2080ti 11gb ~84 15360 7.5 Q4_K_M, 0.13.4
RTX 3070ti Mobile 8gb ~65 4096 6.6 Q5_K_M, 0.13.3
RTX 3070ti Mobile 8gb ~37 15360 8.3 (11%/89% CPU/GPU) Q5_K_M, 0.13.3
RTX 3070ti Mobile 8gb ~71 4096 5.8 Q4_K_M, 0.13.4
RTX 3070ti Mobile 8gb ~71 15360 7.5 Q4_K_M, 0.13.4
M1 Max 32gb ~36 4096 6.3 Q5_K_M, 0.13.3
M1 Max 32gb ~37 15360 7.2 Q5_K_M, 0.13.3
M1 Max 32gb ~36 4096 5.5 Q4_K_M, 0.13.4
M1 Max 32gb ~36 15360 6.3 Q4_K_M, 0.13.4