mannix/deepseek-coder-v2-lite-instruct:iq3

mannix/ deepseek-coder-v2-lite-instruct:iq3_xxs

7,500 Downloads Updated 1 year ago

An open-source Mixture-of-Experts code language model that achieves performance comparable to GPT4-Turbo in code-specific tasks.

ollama run mannix/deepseek-coder-v2-lite-instruct:iq3_xxs

curl http://localhost:11434/api/chat \
  -d '{
    "model": "mannix/deepseek-coder-v2-lite-instruct:iq3_xxs",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

from ollama import chat

response = chat(
    model='mannix/deepseek-coder-v2-lite-instruct:iq3_xxs',
    messages=[{'role': 'user', 'content': 'Hello!'}],
)
print(response.message.content)

import ollama from 'ollama'

const response = await ollama.chat({
  model: 'mannix/deepseek-coder-v2-lite-instruct:iq3_xxs',
  messages: [{role: 'user', content: 'Hello!'}],
})
console.log(response.message.content)

Details

Updated 2 years ago

2 years ago

aa07d2877393 · 7.0GB ·

model

archdeepseek2

parameters15.7B

quantizationIQ3_XXS

7.0GB

license

1.1kB

license

14kB

params

{ "num_ctx": 2048, "num_predict": 2048, "stop": [ "User:", "Assistant:",

148B

template

{{ if not .Response }}{{ if .System }}{{ .System }} {{ end }}{{ end }}{{ if .Prompt }}User: {{ .Prom

210B

Readme

Quantization from fp32
Using i-matrix calibration_datav3.txt
New template:
- should work with flash_attention
- doesn’t forget the SYSTEM prompt
- doesn’t forget the context
N.B: if the output breaks ask for repeat (but it shouldn’t with these quants)

DeepSeek-Coder-V2 is an open-source Mixture-of-Experts (MoE) code language model that achieves performance comparable to GPT4-Turbo in code-specific tasks. DeepSeek-Coder-V2 is further pre-trained from DeepSeek-Coder-V2-Base with 6 trillion tokens sourced from a high-quality and multi-source corpus.

Maximum context length: 128K - q4_0 on a RTX3090 24GB can fit in VRAM up to 46K context

References

Hugging Face