fauxpaslife/ nanbeige4.1

699 Downloads Updated 3 weeks ago

3B model that shouldn't be this good - crushes benchmarks through deep chain-of-thought reasoning

ollama run fauxpaslife/nanbeige4.1

curl http://localhost:11434/api/chat \
  -d '{
    "model": "fauxpaslife/nanbeige4.1",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

from ollama import chat

response = chat(
    model='fauxpaslife/nanbeige4.1',
    messages=[{'role': 'user', 'content': 'Hello!'}],
)
print(response.message.content)

import ollama from 'ollama'

const response = await ollama.chat({
  model: 'fauxpaslife/nanbeige4.1',
  messages: [{role: 'user', content: 'Hello!'}],
})
console.log(response.message.content)

Models

Name

1 model

Size

Context

Input

nanbeige4.1:latest

4.2GB · 256K context window · Text · 3 weeks ago

nanbeige4.1:latest

4.2GB

256K

Text

Readme

Nanbeige 4.1 - 3B (Q8_0)

Original model by Nanbeige | GGUF conversion by tantk

** Note: This is a very verbose model, I am impressed by its size and speed + chain-of-thought.

What makes this special

First 3B model to nail BOTH reasoning AND agentic tool use. Most small models pick one lane - this crushes both.

Built with SFT + RL on top of Nanbeige4-3B-Base. Uses internal chain-of-thought reasoning <think> block explosion lol!

Benchmark highlights

Punches 10x above its weight: - Deep Search: 69.9 (Qwen3-32B: 31.6) 🤯 - Arena-Hard-v2: 73.2 (beats Qwen3-32B’s 56.0) - Code: 76.9 LiveCodeBench-V6 - Math: 87.4 AIME 2026, 53.4 IMO-Answer-Bench - Science: 83.8 GPQA, 12.6 HLE - Tool Use: 56.5 BFCL-V4, supports 500+ round tool chains

Best for

Multi-step reasoning tasks
Complex routing decisions (medical/emotional/activity)
RAG with deep semantic search
Agentic workflows with tool calling
Fast local inference with GPT-4 class reasoning depth

Recommended settings

ollama run fauxpaslife/nanbeige4.1 --temperature 0.6 --top-p 0.95

Notes

Native deep-search capability (rare for <10B models)
Sustained reasoning across complex problem chains
Strong preference alignment (beats much larger models)
Max context: 131K tokens

See technical report for full details.

# Nanbeige 4.1 - 3B (Q8_0)

Original model by [Nanbeige](https://huggingface.co/Nanbeige/Nanbeige4.1-3B) | GGUF conversion by [tantk](https://huggingface.co/tantk/Nanbeige4.1-3B-GGUF)

** Note: This is a very verbose model, I am impressed by its size and speed + chain-of-thought.

## What makes this special

![image.png](/assets/fauxpaslife/nanbeige4.1/db272cc6-23b2-4c1c-b178-91fcde4c780f)

**First 3B model to nail BOTH reasoning AND agentic tool use.** Most small models pick one lane - this crushes both.

Built with SFT + RL on top of Nanbeige4-3B-Base. Uses internal chain-of-thought reasoning `<think>` block explosion lol!

## Benchmark highlights

**Punches 10x above its weight:**
- **Deep Search:** 69.9 (Qwen3-32B: 31.6) 🤯
- **Arena-Hard-v2:** 73.2 (beats Qwen3-32B's 56.0)
- **Code:** 76.9 LiveCodeBench-V6
- **Math:** 87.4 AIME 2026, 53.4 IMO-Answer-Bench
- **Science:** 83.8 GPQA, 12.6 HLE
- **Tool Use:** 56.5 BFCL-V4, supports 500+ round tool chains

## Best for
- Multi-step reasoning tasks
- Complex routing decisions (medical/emotional/activity)
- RAG with deep semantic search
- Agentic workflows with tool calling
- Fast local inference with GPT-4 class reasoning depth

## Recommended settings
```bash
ollama run fauxpaslife/nanbeige4.1 --temperature 0.6 --top-p 0.95
```

## Notes
- Native deep-search capability (rare for <10B models)
- Sustained reasoning across complex problem chains
- Strong preference alignment (beats much larger models)
- Max context: 131K tokens

See [technical report](https://huggingface.co/Nanbeige/Nanbeige4.1-3B) for full details.

Paste, drop or click to upload images (.png, .jpeg, .jpg, .svg, .gif)