iliafed/ nemotron3-quant

36 Downloads Updated 3 weeks ago

vision tools thinking audio

ollama run iliafed/nemotron3-quant

curl http://localhost:11434/api/chat \
  -d '{
    "model": "iliafed/nemotron3-quant",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

from ollama import chat

response = chat(
    model='iliafed/nemotron3-quant',
    messages=[{'role': 'user', 'content': 'Hello!'}],
)
print(response.message.content)

import ollama from 'ollama'

const response = await ollama.chat({
  model: 'iliafed/nemotron3-quant',
  messages: [{role: 'user', content: 'Hello!'}],
})
console.log(response.message.content)

Applications

Claude Code

Claude Code ollama launch claude --model iliafed/nemotron3-quant

Codex App

Codex App ollama launch codex-app --model iliafed/nemotron3-quant

OpenClaw

OpenClaw ollama launch openclaw --model iliafed/nemotron3-quant

Hermes Agent

Hermes Agent ollama launch hermes --model iliafed/nemotron3-quant

Codex

Codex ollama launch codex --model iliafed/nemotron3-quant

OpenCode

OpenCode ollama launch opencode --model iliafed/nemotron3-quant

Models

Name

1 model

Size / Usage

Context

Input

nemotron3-quant:latest

28GB · 128K context window · Text, Image · 3 weeks ago

nemotron3-quant:latest

28GB

128K

Text, Image

Readme

iliafed/nemotron3-quant

Quantized Ollama build of nemotron3:33b, configured for large-context local inference and TurboQuant-style KV-cache compression.

Model

Base model: nemotron3:33b
Architecture: nemotron_h_omni
Parameters: 33B
Weight quantization: Q4_K_M
Configured context: 262144
Capabilities: text completion, vision, audio, tools, thinking
Renderer/parser: nemotron-3-nano

Run

”`bash ollama run iliafed/nemotron3-quant Or pull first:

ollama pull iliafed/nemotron3-quant Included Ollama parameters PARAMETER num_ctx 262144 PARAMETER temperature 1 PARAMETER top_p 0.95 Recommended TurboQuant KV-cache runtime For lower memory use at very large context sizes, run Ollama with Flash Attention and compressed KV cache:

setx OLLAMA_FLASH_ATTENTION 1 setx OLLAMA_KV_CACHE_TYPE “tbqp3/tbq3” setx OLLAMA_CONTEXT_LENGTH 262144 Restart Ollama after setting these variables.

Temporary PowerShell session:

\(env:OLLAMA_FLASH_ATTENTION="1" \)env:OLLAMA_KV_CACHE_TYPE=“tbqp3/tbq3” $env:OLLAMA_CONTEXT_LENGTH=“262144” ollama serve Notes OLLAMA_KV_CACHE_TYPE=tbqp3/tbq3 is a runtime/server setting, not a model file setting. The model itself contains the configured sampling and context parameters, while KV-cache compression must be enabled on the machine running Ollama.

This is not a fine-tune. It is a quantized Ollama packaging of Nemotron 3 33B with large-context defaults.

# iliafed/nemotron3-quant

Quantized Ollama build of `nemotron3:33b`, configured for large-context local inference and TurboQuant-style KV-cache compression.

## Model

- Base model: `nemotron3:33b`
- Architecture: `nemotron_h_omni`
- Parameters: 33B
- Weight quantization: `Q4_K_M`
- Configured context: `262144`
- Capabilities: text completion, vision, audio, tools, thinking
- Renderer/parser: `nemotron-3-nano`

## Run

```bash
ollama run iliafed/nemotron3-quant
Or pull first:

ollama pull iliafed/nemotron3-quant
Included Ollama parameters
PARAMETER num_ctx 262144
PARAMETER temperature 1
PARAMETER top_p 0.95
Recommended TurboQuant KV-cache runtime
For lower memory use at very large context sizes, run Ollama with Flash Attention and compressed KV cache:

setx OLLAMA_FLASH_ATTENTION 1
setx OLLAMA_KV_CACHE_TYPE "tbqp3/tbq3"
setx OLLAMA_CONTEXT_LENGTH 262144
Restart Ollama after setting these variables.

Temporary PowerShell session:

$env:OLLAMA_FLASH_ATTENTION="1"
$env:OLLAMA_KV_CACHE_TYPE="tbqp3/tbq3"
$env:OLLAMA_CONTEXT_LENGTH="262144"
ollama serve
Notes
OLLAMA_KV_CACHE_TYPE=tbqp3/tbq3 is a runtime/server setting, not a model file setting. The model itself contains the configured sampling and context parameters, while KV-cache compression must be enabled on the machine running Ollama.

This is not a fine-tune. It is a quantized Ollama packaging of Nemotron 3 33B with large-context defaults.

Paste, drop or click to upload images (.png, .jpeg, .jpg, .svg, .gif)