
From mradermacher/Nanbeige4.1-3B-GGUF
Nanbeige4.1-3B is built upon Nanbeige4-3B-Base and represents an enhanced iteration of our previous reasoning model, Nanbeige4-3B-Thinking-2511, achieved through further post-training optimization with supervised fine-tuning (SFT) and reinforcement learning (RL). As a highly competitive open-source model at a small parameter scale, Nanbeige4.1-3B illustrates that compact models can simultaneously achieve robust reasoning, preference alignment, and effective agentic behaviors.

Specifically, Nanbeige4.1-3B exhibits the following key strengths:
- Strong Reasoning: Nanbeige4.1-3B is capable of solving complex, multi-step problems through sustained and coherent reasoning within a single forward pass, and reliably produces correct final answers on challenging tasks such as LiveCodeBench-Pro, IMO-Answer-Bench, and AIME 2026 I.
- Robust Preference Alignment: Nanbeige4.1-3B achieves solid alignment performance, outperforming not only same-scale models such as Qwen3-4B-2507 and Nanbeige4-3B-2511, but also substantially larger models including Qwen3-30B-A3B and Qwen3-32B on Arena-Hard-v2 and Multi-Challenge.
- Agentic Capability: Nanbeige4.1-3B is the first general small model to natively support deep-search tasks and reliably sustain complex problem solving involving more than 500 rounds of tool invocations. It fills a long-standing gap in the small-model ecosystem where models are typically optimized for either general reasoning or agentic scenarios, but rarely excel at both.
Technical Report: Link