youtu/ youtu:latest

759 1 month ago

Youtu-LLM is a new, small, yet powerful LLM, contains only 1.96B parameters, supports 128k long context, and has native agentic talents. Not yet runnable. Requires Ollama with the latest llama.cpp changes integrated.

tools thinking 2b
ollama run youtu/youtu

Details

1 month ago

a0fe102b0112 Β· 2.1GB Β·

deepseek2
Β·
1.96B
Β·
Q8_0
You are Youtu-LLM, a helpful AI assistant developed by Tencent Youtu Lab.
<|begin_of_text|>{{- if .System }}{{ .System }}{{ else }}You are Youtu-LLM, a helpful AI assistant d
{ "num_ctx": 8192, "repeat_penalty": 1.05, "stop": [ "<|end_of_text|>",

Readme

🎯 Brief Introduction

This repository hosts the Ollama model package for Tencent’s Youtu-LLM-2B.

Youtu-LLM is a new, small, yet powerful LLM, contains only 1.96B parameters, supports 128k long context, and has native agentic talents. On general evaluations, Youtu-LLM significantly outperforms SOTA LLMs of similar size in terms of Commonsense, STEM, Coding and Long Context capabilities; in agent-related testing, Youtu-LLM surpasses larger-sized leaders and is truly capable of completing multiple end2end agent tasks.

Youtu-LLM has the following features:

  • Type: Autoregressive Causal Language Models with Dense MLA
  • Release versions: Base and Instruct
  • Number of Parameters: 1.96B
  • Number of Layers: 32
  • Number of Attention Heads (MLA): 16 for Q/K/V
  • MLA Rank: 1,536 for Q, 512 for K/V
  • MLA Dim: 128 for QK Nope, 64 for QK Rope, and 128 for V
  • Context Length: 131,072
  • Vocabulary Size: 128,256

πŸ€— Model Download

Model Name Description Download
Youtu-LLM-2B-Base Base model of Youtu-LLM-2B πŸ€— Model
Youtu-LLM-2B Instruct model of Youtu-LLM-2B πŸ€— Model
Youtu-LLM-2B-GGUF Instruct model of Youtu-LLM-2B, in GGUF format πŸ€— Model

πŸ“Š Performance Comparisons

Instruct Model

Comparison between Youtu-LLM-2B and baselines

General Benchmarks

Benchmark DeepSeek-R1-Distill-Qwen-1.5B Qwen3-1.7B SmolLM3-3B Qwen3-4B DeepSeek-R1-Distill-Llama-8B Youtu-LLM-2B
Commonsense Knowledge Reasoning
MMLU-Redux 53.0% 74.1% 75.6% 83.8% 78.1% 75.8%
MMLU-Pro 36.5% 54.9% 53.0% 69.1% 57.5% 61.6%
Instruction Following & Text Reasoning
IFEval 29.4% 70.4% 60.4% 83.6% 34.6% 81.2%
DROP 41.3% 72.5% 72.0% 82.9% 73.1% 86.7%
MUSR 43.8% 56.6% 54.1% 60.5% 59.7% 57.4%
STEM
MATH-500 84.8% 89.8% 91.8% 95.0% 90.8% 93.7%
AIME 24 30.2% 44.2% 46.7% 73.3% 52.5% 65.4%
AIME 25 23.1% 37.1% 34.2% 64.2% 34.4% 49.8%
GPQA-Diamond 33.6% 36.9% 43.8% 55.2% 45.5% 48.0%
BBH 31.0% 69.1% 76.3% 87.8% 77.8% 77.5%
Coding
HumanEval 64.0% 84.8% 79.9% 95.4% 88.1% 95.9%
HumanEval+ 59.5% 76.2% 74.7% 87.8% 82.5% 89.0%
MBPP 51.5% 80.5% 66.7% 92.3% 73.9% 85.0%
MBPP+ 44.2% 67.7% 56.7% 77.6% 61.0% 71.7%
LiveCodeBench v6 19.8% 30.7% 30.8% 48.5% 36.8% 43.7%

Agentic Benchmarks

Benchmark Qwen3-1.7B SmolLM3-3B Qwen3-4B Youtu-LLM-2B
Deep Research
GAIA 11.4% 11.7% 25.5% 33.9%
xbench 11.7% 13.9% 18.4% 19.5%
Code
SWE-Bench-Verified 0.6% 7.2% 5.7% 17.7%
EnConda-Bench 10.8% 3.5% 16.1% 21.5%
Tool
BFCL V3 55.5% 31.5% 61.7% 58.0%
τ²-Bench 2.6% 9.7% 10.9% 15.0%