A strong Mixture-of-Experts (MoE) language model with 671B total parameters with 37B activated for each token.
671b
501 Pulls Updated 3 weeks ago
d318c0731575 · 160B
{
"num_gpu": 1,
"stop": [
"<|begin▁of▁sentence|>",
"<|end▁of▁sentence|>",
"<|User|>",
"<|Assistant|>"
]
}