71 4 months ago

Our first Mixture of Experts model, with 800 million active parameters, reaches state-of-the-art performance in the sub-1b range.

tools