21 3 days ago

T-lite-it-2.1 is an efficient Russian model built upon the Qwen 3 architecture, featuring significant improvements in instruction following and adds support for tool-calling capabilities (quantized Q4_K_M)

tools thinking 8b

3 days ago

313cea45e9fc · 5.0GB ·

qwen3
·
8.19B
·
Q4_K_M
{{- $lastUserIdx := -1 -}} {{- range $idx, $msg := .Messages -}} {{- if eq $msg.Role "user" }}{{ $la
{ "repeat_penalty": 1, "stop": [ "<|im_start|>", "<|im_end|>" ], "te

Readme

Based on https://huggingface.co/t-tech/T-lite-it-2.1-GGUF

Release https://habr.com/ru/companies/tbank/articles/979650/

Feature Value
vision false
thinking true
tools true
Device Speed, token/s Context VRAM, gb Versions
RTX 3090 24gb ~117 4096 6.6 Q5_K_M, 0.13.3
RTX 3090 24gb ~117 15360 8.3 Q5_K_M, 0.13.3
RTX 3090 24gb ~119 4096 5.8 Q4_K_M, 0.13.4
RTX 3090 24gb ~129 15360 7.5 Q4_K_M, 0.13.4
RTX 2080ti 11gb ~77 4096 6.6 Q5_K_M, 0.13.3
RTX 2080ti 11gb ~77 15360 8.3 Q5_K_M, 0.13.3
RTX 2080ti 11gb ~84 4096 5.8 Q4_K_M, 0.13.4
RTX 2080ti 11gb ~84 15360 7.5 Q4_K_M, 0.13.4
RTX 3070ti Mobile 8gb ~65 4096 6.6 Q5_K_M, 0.13.3
RTX 3070ti Mobile 8gb ~37 15360 8.3 (11%/89% CPU/GPU) Q5_K_M, 0.13.3
RTX 3070ti Mobile 8gb ~71 4096 5.8 Q4_K_M, 0.13.4
RTX 3070ti Mobile 8gb ~71 15360 7.5 Q4_K_M, 0.13.4
M1 Max 32gb ~36 4096 6.3 Q5_K_M, 0.13.3
M1 Max 32gb ~37 15360 7.2 Q5_K_M, 0.13.3
M1 Max 32gb ~36 4096 5.5 Q4_K_M, 0.13.4
M1 Max 32gb ~36 15360 6.3 Q4_K_M, 0.13.4