353 Downloads Updated 1 month ago
Updated 1 month ago
1 month ago
693b1b10da78 · 104GB
Merged gguf files from Unsloth’s Q3_K_XL using the default Qwen3:255b modelfile with the recommended settings from Qwen, slightly updated by the temperature and top_p defaults recommended by Unsloth (close to the end of the page).
Find the thinking model here.
Benchmarks on an Apple Mac Studio M4 Max 128GB (16⁄40 Cores) while doing basic home office work in parallel.
This model took ~50 seconds to load after downloading. It takes around 15 seconds after a restart to load, once it went inactive after a while, it takes between 6 to 10 seconds to wake up accepting inputs to ollama run
.
Prompt:
Explain quantum computing like I am 10. Build 10 topics and explain them with 5 sentences each.
VRAM: 108 GB
total duration: 57.875090667s
load duration: 30.372542ms
prompt eval count: 35 token(s)
prompt eval duration: 552.349291ms
prompt eval rate: 63.37 tokens/s
eval count: 1032 token(s)
eval duration: 57.291978417s
eval rate: 18.01 tokens/s
VRAM: 111 GB
total duration: 54.706218375s
load duration: 30.177458ms
prompt eval count: 35 token(s)
prompt eval duration: 800.697375ms
prompt eval rate: 43.71 tokens/s
eval count: 976 token(s)
eval duration: 53.874746708s
eval rate: 18.12 tokens/s
VRAM: 117 GB
total duration: 48.719248791s
load duration: 32.634958ms
prompt eval count: 35 token(s)
prompt eval duration: 800.158375ms
prompt eval rate: 43.74 tokens/s
eval count: 877 token(s)
eval duration: 47.885500208s
eval rate: 18.31 tokens/s
Prompt:
Summarize this into 5 topics with 5 sentences each: this blog post with 2009 tokens
VRAM: 108 GB
total duration: 1m9.037277875s
load duration: 32.804167ms
prompt eval count: 2048 token(s)
prompt eval duration: 12.508220667s
prompt eval rate: 163.73 tokens/s
eval count: 757 token(s)
eval duration: 56.495572083s
eval rate: 13.40 tokens/s
VRAM: 111 GB
total duration: 1m12.05915525s
load duration: 32.846292ms
prompt eval count: 2048 token(s)
prompt eval duration: 12.573978083s
prompt eval rate: 162.88 tokens/s
eval count: 736 token(s)
eval duration: 59.451638709s
eval rate: 12.38 tokens/s
VRAM: 117 GB
total duration: 1m10.380066708s
load duration: 33.03275ms
prompt eval count: 2048 token(s)
prompt eval duration: 12.561087333s
prompt eval rate: 163.04 tokens/s
eval count: 721 token(s)
eval duration: 57.7849875s
eval rate: 12.48 tokens/s
Prompt:
Summarize this into 5 topics with 5 sentences each: this blog post with 4683 tokens
VRAM: 117 GB
total duration: 2m3.843154916s
load duration: 30.491875ms
prompt eval count: 4787 token(s)
prompt eval duration: 32.470091667s
prompt eval rate: 147.43 tokens/s
eval count: 772 token(s)
eval duration: 1m31.341756s
eval rate: 8.45 tokens/s
Note to self how I merged, updated and pushed this model to the Ollama library:
./llama-gguf-split --merge downloaded-model-00001-of-00003.gguf newmodel.gguf
ollama show
ollama show modelname —-modelfile > original-modelfile.txt
FROM …
ollama create mymodel --file new-modelfile.txt
ollama cp newmodel username/newmodel
ollama push