36 Downloads Updated 3 weeks ago
ollama run iliafed/nemotron3-quant
ollama launch claude --model iliafed/nemotron3-quant
ollama launch codex-app --model iliafed/nemotron3-quant
ollama launch openclaw --model iliafed/nemotron3-quant
ollama launch hermes --model iliafed/nemotron3-quant
ollama launch codex --model iliafed/nemotron3-quant
ollama launch opencode --model iliafed/nemotron3-quant
Quantized Ollama build of nemotron3:33b, configured for large-context local inference and TurboQuant-style KV-cache compression.
nemotron3:33bnemotron_h_omniQ4_K_M262144nemotron-3-nano”`bash ollama run iliafed/nemotron3-quant Or pull first:
ollama pull iliafed/nemotron3-quant Included Ollama parameters PARAMETER num_ctx 262144 PARAMETER temperature 1 PARAMETER top_p 0.95 Recommended TurboQuant KV-cache runtime For lower memory use at very large context sizes, run Ollama with Flash Attention and compressed KV cache:
setx OLLAMA_FLASH_ATTENTION 1 setx OLLAMA_KV_CACHE_TYPE “tbqp3/tbq3” setx OLLAMA_CONTEXT_LENGTH 262144 Restart Ollama after setting these variables.
Temporary PowerShell session:
\(env:OLLAMA_FLASH_ATTENTION="1" \)env:OLLAMA_KV_CACHE_TYPE=“tbqp3/tbq3” $env:OLLAMA_CONTEXT_LENGTH=“262144” ollama serve Notes OLLAMA_KV_CACHE_TYPE=tbqp3/tbq3 is a runtime/server setting, not a model file setting. The model itself contains the configured sampling and context parameters, while KV-cache compression must be enabled on the machine running Ollama.
This is not a fine-tune. It is a quantized Ollama packaging of Nemotron 3 33B with large-context defaults.