3,497 1 year ago

An open-source Mixture-of-Experts code language model that achieves performance comparable to GPT4-Turbo in code-specific tasks.

1 year ago

994a6e62c235 · 8.1GB ·

deepseek2
·
15.7B
·
Q3_K_M
DEEPSEEK LICENSE AGREEMENT Version 1.0, 23 October 2023 Copyright (c) 2023 DeepSeek Section I: PREAM
MIT License Copyright (c) 2023 DeepSeek Permission is hereby granted, free of charge, to any person
{ "num_ctx": 2048, "num_predict": 2048, "stop": [ "User:", "Assistant:",
{{ if not .Response }}{{ if .System }}{{ .System }} {{ end }}{{ end }}{{ if .Prompt }}User: {{ .Prom

Readme

  • Quantization from fp32
  • Using i-matrix calibration_datav3.txt
  • New template:
    • should work with flash_attention
    • doesn’t forget the SYSTEM prompt
    • doesn’t forget the context
  • N.B: if the output breaks ask for repeat (but it shouldn’t with these quants)

DeepSeek-Coder-V2 is an open-source Mixture-of-Experts (MoE) code language model that achieves performance comparable to GPT4-Turbo in code-specific tasks. DeepSeek-Coder-V2 is further pre-trained from DeepSeek-Coder-V2-Base with 6 trillion tokens sourced from a high-quality and multi-source corpus.

Maximum context length: 128K - q4_0 on a RTX3090 24GB can fit in VRAM up to 46K context

References

Hugging Face