An open-source Mixture-of-Experts code language model that achieves performance comparable to GPT4-Turbo in code-specific tasks.
558 Pulls Updated 2 months ago
Updated 3 months ago
3 months ago
a3f7e2de4903 · 8.9GB
model
archdeepseek2
·
parameters15.7B
·
quantizationQ4_0
8.9GB
params
{"num_ctx":2048,"num_predict":2048,"stop":["User:","Assistant:","\u003c|begin▁of▁sentence|\u
148B
template
{{ if not .Response }}{{ if .System }}{{ .System }}
{{ end }}{{ end }}{{ if .Prompt }}User: {{ .Prom
210B
license
MIT License
Copyright (c) 2023 DeepSeek
Permission is hereby granted, free of charge, to any perso
1.1kB
license
DEEPSEEK LICENSE AGREEMENT
Version 1.0, 23 October 2023
Copyright (c) 2023 DeepSeek
Section I: PR
14kB
Readme
- Quantization from
fp32
- Using i-matrix
calibration_datav3.txt
- New template:
- should work with
flash_attention
- doesn’t forget the
SYSTEM
prompt - doesn’t forget the context
- should work with
- N.B: if the output breaks ask for
repeat
(but it shouldn’t with these quants)
DeepSeek-Coder-V2 is an open-source Mixture-of-Experts (MoE) code language model that achieves performance comparable to GPT4-Turbo in code-specific tasks. DeepSeek-Coder-V2 is further pre-trained from DeepSeek-Coder-V2-Base with 6 trillion tokens sourced from a high-quality and multi-source corpus.
Maximum context length: 128K
- q4_0
on a RTX3090 24GB can fit in VRAM up to 46K context