3,426 1 year ago

It uses this one Q4_K_M-imat (4.89 BPW) quant for up to 12288 context sizes. for less than 8gb vram

vision
8ab4849b038c · 254B
{{ if .System }}<|start_header_id|>system<|end_header_id|>
{{ .System }}<|eot_id|>{{ end }}{{ if .Prompt }}<|start_header_id|>user<|end_header_id|>
{{ .Prompt }}<|eot_id|>{{ end }}<|start_header_id|>assistant<|end_header_id|>
{{ .Response }}<|eot_id|>