36 3 weeks ago

vision tools thinking audio
ollama run iliafed/nemotron3-quant

Applications

Claude Code
Claude Code ollama launch claude --model iliafed/nemotron3-quant
Codex App
Codex App ollama launch codex-app --model iliafed/nemotron3-quant
OpenClaw
OpenClaw ollama launch openclaw --model iliafed/nemotron3-quant
Hermes Agent
Hermes Agent ollama launch hermes --model iliafed/nemotron3-quant
Codex
Codex ollama launch codex --model iliafed/nemotron3-quant
OpenCode
OpenCode ollama launch opencode --model iliafed/nemotron3-quant

Models

View all →

Readme

iliafed/nemotron3-quant

Quantized Ollama build of nemotron3:33b, configured for large-context local inference and TurboQuant-style KV-cache compression.

Model

  • Base model: nemotron3:33b
  • Architecture: nemotron_h_omni
  • Parameters: 33B
  • Weight quantization: Q4_K_M
  • Configured context: 262144
  • Capabilities: text completion, vision, audio, tools, thinking
  • Renderer/parser: nemotron-3-nano

Run

”`bash ollama run iliafed/nemotron3-quant Or pull first:

ollama pull iliafed/nemotron3-quant Included Ollama parameters PARAMETER num_ctx 262144 PARAMETER temperature 1 PARAMETER top_p 0.95 Recommended TurboQuant KV-cache runtime For lower memory use at very large context sizes, run Ollama with Flash Attention and compressed KV cache:

setx OLLAMA_FLASH_ATTENTION 1 setx OLLAMA_KV_CACHE_TYPE “tbqp3/tbq3” setx OLLAMA_CONTEXT_LENGTH 262144 Restart Ollama after setting these variables.

Temporary PowerShell session:

\(env:OLLAMA_FLASH_ATTENTION="1" \)env:OLLAMA_KV_CACHE_TYPE=“tbqp3/tbq3” $env:OLLAMA_CONTEXT_LENGTH=“262144” ollama serve Notes OLLAMA_KV_CACHE_TYPE=tbqp3/tbq3 is a runtime/server setting, not a model file setting. The model itself contains the configured sampling and context parameters, while KV-cache compression must be enabled on the machine running Ollama.

This is not a fine-tune. It is a quantized Ollama packaging of Nemotron 3 33B with large-context defaults.