3,667 2 weeks ago

Nanbeige4.1-3B illustrates that compact models can simultaneously achieve robust reasoning, preference alignment, and effective agentic behaviors

tools thinking 3b
ollama run tomng/nanbeige4.1:3b

Details

2 weeks ago

dd2a8ce0f368 · 4.2GB ·

llama
·
3.93B
·
Q8_0
Apache License Version 2.0, January 2004 http://www.apache.org/licenses/ TERMS AND CONDITIONS FOR US
{ "min_p": 0.01, "repeat_penalty": 1, "stop": [ "<|im_start|>", "<|im_en
{{- $lastUserIdx := -1 -}} {{- range $idx, $msg := .Messages -}} {{- if eq $msg.Role "user" }}{{ $la

Readme

Nanbeige Logo

From mradermacher/Nanbeige4.1-3B-GGUF

Nanbeige4.1-3B is built upon Nanbeige4-3B-Base and represents an enhanced iteration of our previous reasoning model, Nanbeige4-3B-Thinking-2511, achieved through further post-training optimization with supervised fine-tuning (SFT) and reinforcement learning (RL). As a highly competitive open-source model at a small parameter scale, Nanbeige4.1-3B illustrates that compact models can simultaneously achieve robust reasoning, preference alignment, and effective agentic behaviors.

Specifically, Nanbeige4.1-3B exhibits the following key strengths:

  • Strong Reasoning: Nanbeige4.1-3B is capable of solving complex, multi-step problems through sustained and coherent reasoning within a single forward pass, and reliably produces correct final answers on challenging tasks such as LiveCodeBench-Pro, IMO-Answer-Bench, and AIME 2026 I.
  • Robust Preference Alignment: Nanbeige4.1-3B achieves solid alignment performance, outperforming not only same-scale models such as Qwen3-4B-2507 and Nanbeige4-3B-2511, but also substantially larger models including Qwen3-30B-A3B and Qwen3-32B on Arena-Hard-v2 and Multi-Challenge.
  • Agentic Capability: Nanbeige4.1-3B is the first general small model to natively support deep-search tasks and reliably sustain complex problem solving involving more than 500 rounds of tool invocations. It fills a long-standing gap in the small-model ecosystem where models are typically optimized for either general reasoning or agentic scenarios, but rarely excel at both.

Technical Report: Link