nemotron-mini:4b-instruct-fp16

693.7K Downloads Updated 1 year ago

A commercial-friendly small language model by NVIDIA optimized for roleplay, RAG QA, and function calling.

tools 4b

ollama run nemotron-mini:4b-instruct-fp16

curl http://localhost:11434/api/chat \
  -d '{
    "model": "nemotron-mini:4b-instruct-fp16",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

from ollama import chat

response = chat(
    model='nemotron-mini:4b-instruct-fp16',
    messages=[{'role': 'user', 'content': 'Hello!'}],
)
print(response.message.content)

import ollama from 'ollama'

const response = await ollama.chat({
  model: 'nemotron-mini:4b-instruct-fp16',
  messages: [{role: 'user', content: 'Hello!'}],
})
console.log(response.message.content)

Details

Updated 1 year ago

1 year ago

d0d806e9853c · 8.4GB ·

model

archnemotron

parameters4.19B

quantizationF16

8.4GB

license

NVIDIA AI Foundation Models Community License Agreement IMPORTANT NOTICE – PLEASE READ AND AGREE B

15kB

template

{{- if (or .Tools .System) }}<extra_id_0>System {{ if .System }}{{ .System }} {{ end }} {{- if .Tool

773B

Readme

Nemotron-Mini-4B-Instruct is a model for generating responses for roleplaying, retrieval augmented generation, and function calling. It is a small language model (SLM) optimized through distillation, pruning and quantization for speed and on-device deployment.

This instruct model is optimized for roleplay, RAG QA, and function calling in English. It supports a context length of 4,096 tokens. This model is ready for commercial use.

References

Blog

HuggingFace