246 1 year ago

ollama run cyberuser42/DeepSeek-R1-Distill-Qwen-7B

Models

View all →

Readme

Configured longer sequence length. I recommend running with flash attention and kv-cache quantization if you run out of VRAM.