31 13 hours ago

Quantized Mistral 7B Instruct models optimized for fast, CPU-only local inference with Ollama. Multiple variants balancing speed, quality, and memory efficiency.

5e9d51c15011 · 215B
{
"num_batch": 512,
"num_ctx": 1024,
"num_gpu": 0,
"num_predict": 256,
"num_thread": 8,
"repeat_penalty": 1.1,
"stop": [
"<|im_start|>",
"<|im_end|>",
"</s>"
],
"temperature": 0.7,
"top_k": 40,
"top_p": 0.95
}