41 Downloads Updated 1 year ago
Updated 1 year ago
1 year ago
c4e976a8ea85 · 53GB ·
Two quantized models (Q4_K_M and Q5_K_M) of the Hermes 2 Pro Llama 3 70b model, inspired by instructions published by Robert Sinclair.
latest
llama-quantize --allow-requantize --output-tensor-type f16 --token-embedding-type f16 model.f16.gguf model.f16.q4.gguf q4_k
f16.q5
llama-quantize --allow-requantize --output-tensor-type f16 --token-embedding-type f16 model.f16.gguf model.f16.q5.gguf q5_k