149 1 month ago

16k Context Window meaning you need less RAM to run this. It's full context windows is loaded in the deepseekq3_coder. It allocates the RAM needed for the context when loading the model.

tools thinking
847eb337de75 · 216B
{
"num_ctx": 16000,
"seed": 42,
"stop": [
"<|begin▁of▁sentence|>",
"<|end▁of▁sentence|>",
"<|User|>",
"<|Assistant|>"
],
"temperature": 0.1,
"top_k": 50,
"top_p": 0.95
}