huihui_ai/
kimi-k2:1026b-instruct-Q2_K

8,361 2 months ago

This is not the ablation version. Kimi-K2-Instruct is a state-of-the-art mixture-of-experts (MoE) language model with 32 billion activated parameters and 1 trillion total parameters.

tools 1026b

2 months ago

f72d14c3db1b · 373GB ·

deepseek2
·
1.03T
·
Q2_K
{{ if .Tools }}<|im_system|>tool_declare<|im_middle|>{{ .Tools }}<|im_end|>{{ end }} {{ range $index
Modified MIT License Copyright (c) 2025 Moonshot AI Permission is hereby granted, free of charge, to
{ "num_gpu": 1, "repeat_penalty": 1, "stop": [ "<|im_start|>", "<|im_end

Readme

The current version (0.9.6) of Ollama, due to LLAMA_MAX_EXPERTS being set to 256 in llama-hparams.h, requires manual modification to 384 and recompilation to run properly.

Parameter description

1. num_gpu
The value of num_gpu inside the model is 1, which means it defaults to loading one layer. All others will be loaded into CPU memory. You can modify num_gpu according to your GPU configuration.

/set parameter num_gpu 2

2. num_thread
“num_thread” refers to the number of cores in your computer, and it’s recommended to use half of that, Otherwise, the CPU will be at 100%.

/set parameter num_thread 32

3. num_ctx
“num_ctx” for ollama refers to the number of context slots or the number of contexts the model can maintain during inference.

/set parameter num_ctx 4096

References

HuggingFace

moonshotai/Kimi-K2-Instruct

Donation

You can follow x.com/support_huihui to get the latest model information from huihui.ai.

Your donation helps us continue our further development and improvement, a cup of coffee can do it.
  • bitcoin:
  bc1qqnkhuchxw0zqjh2ku3lu4hq45hc6gy84uk70ge