Details

Updated 3 weeks ago

3 weeks ago

6066c08369de · 24GB ·

model

archnemotron_h_moe

parameters31.6B

quantizationQ4_K_M

24GB

license

NVIDIA Open Model License Agreement Last Modified: October 24, 2025 This NVIDIA Open Model License A

10kB

system

Ты внутренний корпоративный ИИ-ассистент. Всегда отве�

954B

params

{ "num_ctx": 262144, "temperature": 0, "top_p": 0.95 }

48B

nemotron-quant-0t

nemotron-quant-0t is a custom Ollama-ready model built on top of nemotron-cascade-2, optimized for efficient local inference, lower memory usage, and practical real-world deployment. (0 temp)

It is designed for users who want a strong modern language model experience in Ollama with a better balance between performance, responsiveness, and hardware requirements.

Overview

This model is a customized and quant-optimized variant of nemotron-cascade-2.
The goal of this release is to make the base model more accessible for local use while preserving strong generation quality and stable behavior.

In this version, the model has also been modernized with SuperQuant technology by Google, improving quantization efficiency and helping reduce resource usage without making the model impractical for everyday workloads.

Key Features

Based on nemotron-cascade-2
Packaged for Ollama
Quantization-focused build for local inference
Reduced memory footprint compared to heavier unoptimized deployments
Improved efficiency and usability for desktop / workstation setups
Tuned for a solid balance of:
- quality
- latency
- memory use
- deployment simplicity

Intended Use

nemotron-quant-0t is suitable for:

local AI assistants
chat and general prompting
experimentation with local LLM workflows
developer environments
compact self-hosted inference setups
Ollama-based personal or lab use

Model Positioning

This model is aimed at users who want a modern Nemotron-based local model that is easier to run in practice than a raw full-weight setup.

Rather than targeting maximum theoretical size or complexity, nemotron-quant-0t focuses on practical usability:

fast enough for interactive use
efficient enough for local hardware
capable enough for everyday tasks

Technical Notes

Base model: nemotron-cascade-2
Format: Ollama-compatible build
Optimization focus: quantized local inference
Modernization: enhanced with SuperQuant by Google
Primary goals: efficiency, stability, and accessible deployment

Why this model?

Many local users need a model that is not just powerful, but actually convenient to run.
nemotron-quant-0t was built with that in mind: to provide a cleaner local deployment experience while keeping the strengths of the Nemotron family.

Pull

ollama pull iliafed/nemotron-quant-0t

Run

ollama run iliafed/nemotron-quant-0t

Notes

This is a custom release intended for the Ollama ecosystem.
If you are looking for a Nemotron-derived model that emphasizes practical local performance, quant efficiency, and easy deployment, this model is worth trying.