youtu/youtu

youtu/ youtu:latest

759 Downloads Updated 1 month ago

Youtu-LLM is a new, small, yet powerful LLM, contains only 1.96B parameters, supports 128k long context, and has native agentic talents. Not yet runnable. Requires Ollama with the latest llama.cpp changes integrated.

tools thinking 2b

ollama run youtu/youtu

curl http://localhost:11434/api/chat \
  -d '{
    "model": "youtu/youtu",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

from ollama import chat

response = chat(
    model='youtu/youtu',
    messages=[{'role': 'user', 'content': 'Hello!'}],
)
print(response.message.content)

import ollama from 'ollama'

const response = await ollama.chat({
  model: 'youtu/youtu',
  messages: [{role: 'user', content: 'Hello!'}],
})
console.log(response.message.content)

Details

Updated 1 month ago

1 month ago

a0fe102b0112 · 2.1GB ·

model

archdeepseek2

parameters1.96B

quantizationQ8_0

2.1GB

system

You are Youtu-LLM, a helpful AI assistant developed by Tencent Youtu Lab.

73B

template

<|begin_of_text|>{{- if .System }}{{ .System }}{{ else }}You are Youtu-LLM, a helpful AI assistant d

1.2kB

params

{ "num_ctx": 8192, "repeat_penalty": 1.05, "stop": [ "<|end_of_text|>",

137B

Readme

📃 License • 💻 Code • 📑 Technical Report

🎯 Brief Introduction

This repository hosts the Ollama model package for Tencent’s Youtu-LLM-2B.

Youtu-LLM is a new, small, yet powerful LLM, contains only 1.96B parameters, supports 128k long context, and has native agentic talents. On general evaluations, Youtu-LLM significantly outperforms SOTA LLMs of similar size in terms of Commonsense, STEM, Coding and Long Context capabilities; in agent-related testing, Youtu-LLM surpasses larger-sized leaders and is truly capable of completing multiple end2end agent tasks.

Youtu-LLM has the following features:

Type: Autoregressive Causal Language Models with Dense MLA
Release versions: Base and Instruct
Number of Parameters: 1.96B
Number of Layers: 32
Number of Attention Heads (MLA): 16 for Q/K/V
MLA Rank: 1,536 for Q, 512 for K/V
MLA Dim: 128 for QK Nope, 64 for QK Rope, and 128 for V
Context Length: 131,072
Vocabulary Size: 128,256

🤗 Model Download

Model Name	Description	Download
Youtu-LLM-2B-Base	Base model of Youtu-LLM-2B	🤗 Model
Youtu-LLM-2B	Instruct model of Youtu-LLM-2B	🤗 Model
Youtu-LLM-2B-GGUF	Instruct model of Youtu-LLM-2B, in GGUF format	🤗 Model

📊 Performance Comparisons

Instruct Model

Comparison between Youtu-LLM-2B and baselines

General Benchmarks

Benchmark	DeepSeek-R1-Distill-Qwen-1.5B	Qwen3-1.7B	SmolLM3-3B	Qwen3-4B	DeepSeek-R1-Distill-Llama-8B	Youtu-LLM-2B
Commonsense Knowledge Reasoning
MMLU-Redux	53.0%	74.1%	75.6%	83.8%	78.1%	75.8%
MMLU-Pro	36.5%	54.9%	53.0%	69.1%	57.5%	61.6%
Instruction Following & Text Reasoning
IFEval	29.4%	70.4%	60.4%	83.6%	34.6%	81.2%
DROP	41.3%	72.5%	72.0%	82.9%	73.1%	86.7%
MUSR	43.8%	56.6%	54.1%	60.5%	59.7%	57.4%
STEM
MATH-500	84.8%	89.8%	91.8%	95.0%	90.8%	93.7%
AIME 24	30.2%	44.2%	46.7%	73.3%	52.5%	65.4%
AIME 25	23.1%	37.1%	34.2%	64.2%	34.4%	49.8%
GPQA-Diamond	33.6%	36.9%	43.8%	55.2%	45.5%	48.0%
BBH	31.0%	69.1%	76.3%	87.8%	77.8%	77.5%
Coding
HumanEval	64.0%	84.8%	79.9%	95.4%	88.1%	95.9%
HumanEval+	59.5%	76.2%	74.7%	87.8%	82.5%	89.0%
MBPP	51.5%	80.5%	66.7%	92.3%	73.9%	85.0%
MBPP+	44.2%	67.7%	56.7%	77.6%	61.0%	71.7%
LiveCodeBench v6	19.8%	30.7%	30.8%	48.5%	36.8%	43.7%

Agentic Benchmarks

Benchmark	Qwen3-1.7B	SmolLM3-3B	Qwen3-4B	Youtu-LLM-2B
Deep Research
GAIA	11.4%	11.7%	25.5%	33.9%
xbench	11.7%	13.9%	18.4%	19.5%
Code
SWE-Bench-Verified	0.6%	7.2%	5.7%	17.7%
EnConda-Bench	10.8%	3.5%	16.1%	21.5%
Tool
BFCL V3	55.5%	31.5%	61.7%	58.0%
τ²-Bench	2.6%	9.7%	10.9%	15.0%

[📃 License](https://huggingface.co/tencent/Youtu-LLM-2B/blob/main/LICENSE.txt) • [💻 Code](https://github.com/TencentCloudADP/youtu-tip/tree/master/youtu-llm) • [📑 Technical Report](https://arxiv.org/abs/2512.24618)

</div>

## 🎯 Brief Introduction

This repository hosts the **Ollama model package** for [Tencent's Youtu-LLM-2B](https://huggingface.co/tencent/Youtu-LLM-2B).

**Youtu-LLM** is a new, small, yet powerful LLM, contains only 1.96B parameters, supports 128k long context, and has native agentic talents. On general evaluations, Youtu-LLM significantly outperforms SOTA LLMs of similar size in terms of Commonsense, STEM, Coding and Long Context capabilities; in agent-related testing, Youtu-LLM surpasses larger-sized leaders and is truly capable of completing multiple end2end agent tasks.

**Youtu-LLM** has the following features:

- Type: Autoregressive Causal Language Models with Dense [MLA](https://arxiv.org/abs/2405.04434)
- Release versions: [Base](https://huggingface.co/tencent/Youtu-LLM-2B-Base) and [Instruct](https://huggingface.co/tencent/Youtu-LLM-2B)
- Number of Parameters: 1.96B
- Number of Layers: 32
- Number of Attention Heads (MLA): 16 for Q/K/V
- MLA Rank: 1,536 for Q, 512 for K/V 
- MLA Dim: 128 for QK Nope, 64 for QK Rope, and 128 for V
- Context Length: 131,072
- Vocabulary Size: 128,256

## 🤗 Model Download
| Model Name  | Description | Download |
| ----------- | ----------- |-----------
| Youtu-LLM-2B-Base  | Base model of Youtu-LLM-2B |🤗 [Model](https://huggingface.co/tencent/Youtu-LLM-2B-Base)|
| Youtu-LLM-2B | Instruct model of Youtu-LLM-2B | 🤗 [Model](https://huggingface.co/tencent/Youtu-LLM-2B)|
| Youtu-LLM-2B-GGUF | Instruct model of Youtu-LLM-2B, in GGUF format | 🤗 [Model](https://huggingface.co/tencent/Youtu-LLM-2B-GGUF)|

## 📊 Performance Comparisons
### Instruct Model

#### General Benchmarks
| Benchmark | DeepSeek-R1-Distill-Qwen-1.5B | Qwen3-1.7B | SmolLM3-3B | Qwen3-4B | DeepSeek-R1-Distill-Llama-8B | Youtu-LLM-2B |
| :--- | :---: | :---: | :---: | :---: | :---: | :---: |
| **Commonsense Knowledge Reasoning** | | | | | | |
| MMLU-Redux | 53.0% | 74.1% | 75.6% | **83.8%** | 78.1% | 75.8% |
| MMLU-Pro | 36.5% | 54.9% | 53.0% | **69.1%** | 57.5% | 61.6% |
| **Instruction Following & Text Reasoning** | | | | | | |
| IFEval | 29.4% | 70.4% | 60.4% | **83.6%** | 34.6% | 81.2% |
| DROP | 41.3% | 72.5% | 72.0% | 82.9% | 73.1% | **86.7%** |
| MUSR | 43.8% | 56.6% | 54.1% | **60.5%** | 59.7% | 57.4% |
| **STEM** | | | | | | |
| MATH-500 | 84.8% | 89.8% | 91.8% | **95.0%** | 90.8% | 93.7% |
| AIME 24 | 30.2% | 44.2% | 46.7% | **73.3%** | 52.5% | 65.4% |
| AIME 25 | 23.1% | 37.1% | 34.2% | **64.2%** | 34.4% | 49.8% |
| GPQA-Diamond | 33.6% | 36.9% | 43.8% | **55.2%** | 45.5% | 48.0% |
| BBH | 31.0% | 69.1% | 76.3% | **87.8%** | 77.8% | 77.5% |
| **Coding** | | | | | | |
| HumanEval | 64.0% | 84.8% | 79.9% | 95.4% | 88.1% | **95.9%** |
| HumanEval+ | 59.5% | 76.2% | 74.7% | 87.8% | 82.5% | **89.0%** |
| MBPP | 51.5% | 80.5% | 66.7% | **92.3%** | 73.9% | 85.0% |
| MBPP+ | 44.2% | 67.7% | 56.7% | **77.6%** | 61.0% | 71.7% |
| LiveCodeBench v6 | 19.8% | 30.7% | 30.8% | **48.5%** | 36.8% | 43.7% |

#### Agentic Benchmarks
| Benchmark | Qwen3-1.7B | SmolLM3-3B | Qwen3-4B | Youtu-LLM-2B |
| :--- | :---: | :---: | :---: | :---: |
| **Deep Research** | | | | |
| GAIA | 11.4% | 11.7% | 25.5% | **33.9%** |
| xbench | 11.7% | 13.9% | 18.4% | **19.5%** |
| **Code** | | | | |
| SWE-Bench-Verified | 0.6% | 7.2% | 5.7% | **17.7%** |
| EnConda-Bench | 10.8% | 3.5% | 16.1% | **21.5%** |
| **Tool** | | | | |
| BFCL V3 | 55.5% | 31.5% | **61.7%** | 58.0% |
| τ²-Bench | 2.6% | 9.7% | 10.9% | **15.0%** |

Paste, drop or click to upload images (.png, .jpeg, .jpg, .svg, .gif)