Randomblock1/nemotron-nano

Llama 3.1 customized by NVIDIA into a reasoning model that is post trained for reasoning, human chat preferences, and tasks, such as RAG and tool calling. Significantly improved performance over base Llama model.

Applications

Claude Code ollama launch claude --model Randomblock1/nemotron-nano

Codex ollama launch codex --model Randomblock1/nemotron-nano

OpenCode ollama launch opencode --model Randomblock1/nemotron-nano

OpenClaw ollama launch openclaw --model Randomblock1/nemotron-nano

Llama-3.1-Nemotron-Nano-8B-v1 is a large language model (LLM) which is a derivative of Meta Llama-3.1-8B-Instruct. It is a reasoning model that is post trained for reasoning, human chat preferences, and tasks, such as RAG and tool calling.

This model underwent a multi-phase post-training process to enhance both its reasoning and non-reasoning capabilities. This includes a supervised fine-tuning stage for Math, Code, Reasoning, and Tool Calling as well as multiple reinforcement learning (RL) stages using REINFORCE (RLOO) and Online Reward-aware Preference Optimization (RPO) algorithms for both chat and instruction-following. The final model checkpoint is obtained after merging the final SFT and Online RPO checkpoints. Improved using Qwen.

Intended use

Llama-3.1-Nemotron-Nano-8B-v1 is a general purpose reasoning and chat model intended to be used in English and coding languages. Other non-English languages (German, French, Italian, Portuguese, Hindi, Spanish, and Thai) are also supported. This model is ready for commercial use.

Model Version:

1.0 (3/18/2025)

Quick Start and Usage Recommendations:

Reasoning mode (ON/OFF) is controlled via the system prompt, which must be set as either detailed thinking on or detailed thinking off. All instructions should be contained within the user prompt. By default, thinking is on.
The model will include <think></think> if no reasoning was necessary in Reasoning ON model, this is expected behavior.

For some prompts, even if thinking is disabled, the model emergently prefers to think before responding.

Reference

Hugging Face