mannix/deepseek-coder-v2-lite-instruct

Quantization from fp32
Using i-matrix calibration_datav3.txt
New template:
- should work with flash_attention
- doesn’t forget the SYSTEM prompt
- doesn’t forget the context
N.B: if the output breaks ask for repeat (but it shouldn’t with these quants)

DeepSeek-Coder-V2 is an open-source Mixture-of-Experts (MoE) code language model that achieves performance comparable to GPT4-Turbo in code-specific tasks. DeepSeek-Coder-V2 is further pre-trained from DeepSeek-Coder-V2-Base with 6 trillion tokens sourced from a high-quality and multi-source corpus.

Maximum context length: 128K - q4_0 on a RTX3090 24GB can fit in VRAM up to 46K context

References

Hugging Face