27.9K 1 month ago

Specialized uncensored quants for new OpenAI 20B MOE - Mixture of Experts Model at 80+ T/S. "HERETIC" method results in a model (quantized Q5_1)

thinking 20b
ollama run second_constantine/gpt-oss-u:20b

Models

View all →

Readme

Based on https://huggingface.co/DavidAU/OpenAi-GPT-oss-20b-HERETIC-uncensored-NEO-Imatrix-gguf

Feature Value
vision false
thinking true (without switching off)
tools not working
Device Speed, token/s Context Memory, gb Versions
RTX 3090 24gb ~187 8192 16 Q5_1, 0.13.5
RTX 3090 24gb ~189 16384 16 Q5_1, 0.13.5
RTX 3090 24gb ~188 128k 19 Q5_1, 0.13.5
M1 Max 32gb ~59 8192 16 Q5_1, 0.13.5
i7-12700H + 3070ti Mobile 8gb ~12 8192 16 (55%/45% CPU/GPU) Q5_1, 0.13.5
i5-1235U ~6 4096 16 (100% CPU) Q5_1, 0.13.5