367 2 months ago

GigaChat3-10B-A1.8B is a dialogue model of the GigaChat family. The model is based on a Mixture-of-Experts (MoE) architecture with 10B total and 1.8B active parameters. The architecture includes Multi-head Latent Attention and Multi-Token Prediction.

8ab4849b038c · 254B
{{ if .System }}<|start_header_id|>system<|end_header_id|>
{{ .System }}<|eot_id|>{{ end }}{{ if .Prompt }}<|start_header_id|>user<|end_header_id|>
{{ .Prompt }}<|eot_id|>{{ end }}<|start_header_id|>assistant<|end_header_id|>
{{ .Response }}<|eot_id|>