A strong Mixture-of-Experts (MoE) language model with 671B total parameters with 37B activated for each token.

2,399 2 months ago

2 Tags
12b8c735ab7d • 244GB • 2 months ago
bce4e76cd607 • 319GB • 2 months ago