364 23 hours ago

NVIDIA Nemotron 3 Nano Omni is a multimodal large language model that unifies video, audio, image, and text understanding to support enterprise-grade Q&A, summarization, transcription, and document intelligence workflows.

vision tools thinking audio 33b
ollama run nemotron3:33b

Applications

Claude Code
Claude Code ollama launch claude --model nemotron3:33b
Codex
Codex ollama launch codex --model nemotron3:33b
OpenCode
OpenCode ollama launch opencode --model nemotron3:33b
OpenClaw
OpenClaw ollama launch openclaw --model nemotron3:33b
Hermes Agent
Hermes Agent ollama launch hermes --model nemotron3:33b

Models

View all →

Readme

NVIDIA Nemotron 3 Nano Omni is a multimodal large language model that unifies video, audio, image, and text understanding to support enterprise-grade Q&A, summarization, transcription, and document intelligence workflows. It extends the Nemotron Nano family with integrated video+speech comprehension, Graphical User Interface (GUI), Optical Character Recognition (OCR), and speech transcription capabilities, enabling end-to-end processing of rich enterprise content such as meeting recordings, M&E assets, training videos, and complex business documents. NVIDIA Nemotron 3 Nano Omni was developed by NVIDIA as part of the Nemotron model family.

This model is available for commercial use.

This model was improved using Qwen3-VL-30B-A3B-Instruct, Qwen3.5-122B-A10B, Qwen3.5-397B-A17B, Qwen2.5-VL-72B-Instruct, and gpt-oss-120b.

License/Terms of Use

Governing Terms: Use of this model is governed by the NVIDIA Open Model Agreement

Use Case

This model is designed for enterprise customers requiring multimodal understanding capabilities. Expected users include: - Customer service applications (e.g., Doordash video of drop-off at a given address via OCR, drive-thru order verification) - Media and Entertainment (M&E) — video and speech analysis, dense captions, video search and summarization - Document intelligence for AI assistants (contracts, SOW/MSA, scientific discovery, financial documents) - GUI automation for AI agentic applications (incident management, agentic search, browser agents, email agents)