265 1 month ago

Possibly useful for agentic AI systems. Apparently compatible with ~16GB VRAM .

tools
ollama run mirage335/Nemotron-3-Nano-30B-A3B-virtuoso

Models

View all →

Readme

Licensed by NVIDIA Corporation under the NVIDIA Nemotron Model License.

NOTICE

Usage

ollama_pull_virtuoso() {
ollama pull mirage335/"$1"
ollama cp mirage335/"$1" "$1"
ollama rm mirage335/"$1"
}

ollama_pull_virtuoso Nemotron-3-Nano-30B-A3B-virtuoso
echo "FROM Nemotron-3-Nano-30B-A3B-virtuoso:latest" > Modelfile-128k
echo "PARAMETER num_ctx 131072" >> Modelfile-128k
echo "PARAMETER num_keep 131072" >> Modelfile-128k
echo "PARAMETER num_predict 131072" >> Modelfile-128k
echo "PARAMETER num_gpu 999" >> Modelfile-128k
ollama create Nemotron-3-Nano-30B-A3B-128k-virtuoso -f Modelfile-128k
rm -f Modelfile-128k

echo "FROM Nemotron-3-Nano-30B-A3B-virtuoso:latest" > Modelfile-256k
echo "PARAMETER num_ctx 262144" >> Modelfile-256k
echo "PARAMETER num_keep 262144" >> Modelfile-256k
echo "PARAMETER num_predict 262144" >> Modelfile-256k
echo "PARAMETER num_gpu 999" >> Modelfile-256k
ollama create Nemotron-3-Nano-30B-A3B-256k-virtuoso -f Modelfile-256k
rm -f Modelfile-256k

# 1M , 1024k
echo "FROM Nemotron-3-Nano-30B-A3B-virtuoso:latest" > Modelfile-1M
echo "PARAMETER num_ctx 1048576" >> Modelfile-1M
echo "PARAMETER num_keep 1048576" >> Modelfile-1M
echo "PARAMETER num_predict 1048576" >> Modelfile-1M
echo "PARAMETER num_gpu 999" >> Modelfile-1M
ollama create Nemotron-3-Nano-30B-A3B-1M-virtuoso -f Modelfile-1M
rm -f Modelfile-1M

Recommended environment variables. KV_CACHE quantization “q4_0” in particular RECOMMENDED, unless “q8_0” is needed (eg. by Qwen-2_5-VL-7B-Instruct-virtuoso, etc).

export OLLAMA_NUM_THREADS=18
export OLLAMA_FLASH_ATTENTION=1
export OLLAMA_KV_CACHE_TYPE="q4_0"
export OLLAMA_NEW_ENGINE=true
export OLLAMA_NOHISTORY=true
export OLLAMA_NUM_PARALLEL=1
export OLLAMA_MAX_LOADED_MODELS=1

Adjust OLLAMA_NUM_THREADS and/or disable HyperThreading, etc, to prevent crippling performance loss.

CAUTION - Preservation

Pulling the model this way relies on the ollama repository, and more generally, reliability of internet services, which has been rather significantly fragile.

If possible, you should use the “Llama-3-virtuoso” project, which automatically caches an automatically installable backup copy.

https://github.com/mirage335-colossus/Llama-3-virtuoso