A strong Mixture-of-Experts (MoE) language model with 671B total parameters with 37B activated for each token.

671b

449 2 weeks ago

5 Tags
360f2525dad4 • 404GB • 2 weeks ago
360f2525dad4 • 404GB • 2 weeks ago
49c763520a73 • 244GB • 2 weeks ago
417546ffc361 • 319GB • 2 weeks ago
360f2525dad4 • 404GB • 2 weeks ago