8 Downloads Updated 2 weeks ago
Built with Llama
Llama 3.3 is licensed under the Llama 3.3 Community License, Copyright © Meta Platforms, Inc. All Rights Reserved.
Licensed by NVIDIA Corporation under the NVIDIA Open Model License
NOTICE
May be able to compare and explain the quality of other LLM responses
IQ2_XXS - Apparently adequate. Compatible with ~16GB VRAM.
ollama_pull_virtuoso() {
ollama pull mirage335/"$1"
ollama cp mirage335/"$1" "$1"
ollama rm mirage335/"$1"
}
ollama_pull_virtuoso Llama-3_3-Nemotron-Super-49B-GenRM-Multilingual-virtuoso
_experiment-Llama-3_3-Nemotron-Super-49B-GenRM-Multilingual() {
_stopwatch curl -X POST http://localhost:11434/api/chat \
-H "Content-Type: application/json" \
-d '{
"model": "Llama-3_3-Nemotron-Super-49B-GenRM-Multilingual-virtuoso",
"messages": [
{ "role": "user", "content": "Tell me about Canada." },
{ "role": "assistant", "content": "Here is a brief fact about Canada:\n\nCanada is home to more lakes than any other country in the world, with over 2 million lakes covering about 8 % of its land area." },
{ "role": "assistant", "content": "Here is a brief fact about Canada:\n\nCanada has more lakes than any other country in the world, with over 2 million lakes covering about 8% of its land area." }
],
"stream": false,
"temperature": 0.04,
"top_k": 7,
"max_tokens": 2048
}'
echo
}
Recommended environment variables. KV_CACHE quantization “q4_0” in particular RECOMMENDED, unless “q8_0” is needed (eg. by Qwen-2_5-VL-7B-Instruct-virtuoso, etc).
export OLLAMA_NUM_THREADS=18
export OLLAMA_FLASH_ATTENTION=1
export OLLAMA_KV_CACHE_TYPE="q4_0"
export OLLAMA_NEW_ENGINE=true
export OLLAMA_NOHISTORY=true
export OLLAMA_NUM_PARALLEL=1
export OLLAMA_MAX_LOADED_MODELS=1
Adjust OLLAMA_NUM_THREADS and/or disable HyperThreading, etc, to prevent crippling performance loss.
Pulling the model this way relies on the ollama repository, and more generally, reliability of internet services, which has been rather significantly fragile.
If possible, you should use the “Llama-3-virtuoso” project, which automatically caches an automatically installable backup copy.