Randomblock1/nemotron-nano:8b-instruct-q6

Llama 3.1 customized by NVIDIA into a reasoning model that is post trained for reasoning, human chat preferences, and tasks, such as RAG and tool calling. Significantly improved performance over base Llama model.

Details

Updated 8 months ago

8 months ago

18578fa5fc1f · 6.6GB ·

model

archllama

parameters8.03B

quantizationQ6_K

6.6GB

template

<|start_header_id|>system<|end_header_id|> {{/* System prompt must be exactly 'detailed thinking on'

1.7kB

system

detailed thinking on

20B

license

LLAMA 3.1 COMMUNITY LICENSE AGREEMENT Llama 3.1 Version Release Date: July 23, 2024 “Agreement”

7.6kB

params

{ "temperature": 0.6, "top_p": 0.95 }

33B

Llama-3.1-Nemotron-Nano-8B-v1 is a large language model (LLM) which is a derivative of Meta Llama-3.1-8B-Instruct. It is a reasoning model that is post trained for reasoning, human chat preferences, and tasks, such as RAG and tool calling.

This model underwent a multi-phase post-training process to enhance both its reasoning and non-reasoning capabilities. This includes a supervised fine-tuning stage for Math, Code, Reasoning, and Tool Calling as well as multiple reinforcement learning (RL) stages using REINFORCE (RLOO) and Online Reward-aware Preference Optimization (RPO) algorithms for both chat and instruction-following. The final model checkpoint is obtained after merging the final SFT and Online RPO checkpoints. Improved using Qwen.

Intended use

Llama-3.1-Nemotron-Nano-8B-v1 is a general purpose reasoning and chat model intended to be used in English and coding languages. Other non-English languages (German, French, Italian, Portuguese, Hindi, Spanish, and Thai) are also supported. This model is ready for commercial use.

Model Version:

1.0 (3/18/2025)

Quick Start and Usage Recommendations:

Reasoning mode (ON/OFF) is controlled via the system prompt, which must be set as either detailed thinking on or detailed thinking off. All instructions should be contained within the user prompt. By default, thinking is on.
The model will include <think></think> if no reasoning was necessary in Reasoning ON model, this is expected behavior.

For some prompts, even if thinking is disabled, the model emergently prefers to think before responding.

Reference

Hugging Face