7,157 Downloads Updated 6 months ago
Llama-3.1-Nemotron-Nano-8B-v1 is a large language model (LLM) which is a derivative of Meta Llama-3.1-8B-Instruct. It is a reasoning model that is post trained for reasoning, human chat preferences, and tasks, such as RAG and tool calling.
This model underwent a multi-phase post-training process to enhance both its reasoning and non-reasoning capabilities. This includes a supervised fine-tuning stage for Math, Code, Reasoning, and Tool Calling as well as multiple reinforcement learning (RL) stages using REINFORCE (RLOO) and Online Reward-aware Preference Optimization (RPO) algorithms for both chat and instruction-following. The final model checkpoint is obtained after merging the final SFT and Online RPO checkpoints. Improved using Qwen.
Llama-3.1-Nemotron-Nano-8B-v1 is a general purpose reasoning and chat model intended to be used in English and coding languages. Other non-English languages (German, French, Italian, Portuguese, Hindi, Spanish, and Thai) are also supported. This model is ready for commercial use.
1.0 (3/18/2025)
detailed thinking on or detailed thinking off. All instructions should be contained within the user prompt. By default, thinking is on.<think></think> if no reasoning was necessary in Reasoning ON model, this is expected behavior.For some prompts, even if thinking is disabled, the model emergently prefers to think before responding.