A commercial-friendly small language model by NVIDIA optimized for roleplay, RAG QA, and function calling.

tools

ollama run tripolskypetr/nemotron-mini

curl http://localhost:11434/api/chat \
  -d '{
    "model": "tripolskypetr/nemotron-mini",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

from ollama import chat

response = chat(
    model='tripolskypetr/nemotron-mini',
    messages=[{'role': 'user', 'content': 'Hello!'}],
)
print(response.message.content)

import ollama from 'ollama'

const response = await ollama.chat({
  model: 'tripolskypetr/nemotron-mini',
  messages: [{role: 'user', content: 'Hello!'}],
})
console.log(response.message.content)

Applications

Claude Code ollama launch claude --model tripolskypetr/nemotron-mini

Codex App ollama launch codex-app --model tripolskypetr/nemotron-mini

OpenClaw ollama launch openclaw --model tripolskypetr/nemotron-mini

Hermes Agent ollama launch hermes --model tripolskypetr/nemotron-mini

Codex ollama launch codex --model tripolskypetr/nemotron-mini

OpenCode ollama launch opencode --model tripolskypetr/nemotron-mini

Models

View all →

Name

1 model

Size / Usage

Context

Input

nemotron-mini:latest

2.7GB · 4K context window · Text · 1 year ago

nemotron-mini:latest

2.7GB

Text

Readme

Nemotron-Mini-4B-Instruct

Nemotron-Mini-4B-Instruct is a model for generating responses for roleplaying, retrieval augmented generation, and function calling. It is a small language model (SLM) optimized through distillation, pruning and quantization for speed and on-device deployment.

This instruct model is optimized for roleplay, RAG QA, and function calling in English. It supports a context length of 4,096 tokens. This model is ready for commercial use.

References

Blog

HuggingFace