104 1 month ago

GigaChat3-10B-A1.8B is a dialogue model of the GigaChat family. The model is based on a Mixture-of-Experts (MoE) architecture with 10B total and 1.8B active parameters. The architecture includes Multi-head Latent Attention and Multi-Token Prediction.

99eebedd34ac · 175B
{
"num_ctx": 8192,
"stop": [
"<|start_header_id|>",
"<|end_header_id|>",
"<|eot_id|>",
"<|end__header_id|>",
"<|start__header_id|>"
]
}