It uses this one Q4_K_M-imat (4.89 BPW) quant for up to 12288 context sizes. for less than 8gb vram

Vision 8B

778 Pulls Updated 4 months ago

{ "stop": [ "<|start_header_id|>", "<|end_header_id|>", "<|eot_id|>" ] }