234 7 months ago

Models

View all →

Readme

Configured longer sequence length. I recommend running with flash attention and kv-cache quantization if you run out of VRAM.