It uses this one Q4_K_M-imat (4.89 BPW) quant for up to 12288 context sizes. for less than 8gb vram
3,426 Pulls 1 Tag Updated 1 year ago
2,093 Pulls 1 Tag Updated 1 year ago