State-of-the-art large embedding model from mixedbread.ai

embedding 335m

572.7K 6 months ago

Readme

mxbai-embed-large

As of March 2024, this model archives SOTA performance for Bert-large sized models on the MTEB. It outperforms commercial models like OpenAIs text-embedding-3-large model and matches the performance of model 20x its size.

mxbai-embed-large was trained with no overlap of the MTEB data, which indicates that the model generalizes well across several domains, tasks and text length.

Usage

REST API

curl http://localhost:11434/api/embeddings -d '{
  "model": "mxbai-embed-large",
  "prompt": "Represent this sentence for searching relevant passages: The sky is blue because of Rayleigh scattering"
}'

Python library

ollama.embeddings(model='mxbai-embed-large', prompt='Represent this sentence for searching relevant passages: The sky is blue because of Rayleigh scattering')

Javascript library

ollama.embeddings({ model: 'mxbai-embed-large', prompt: 'Represent this sentence for searching relevant passages:  The sky is blue because of Rayleigh scattering' })

References

Blog post

Hugging Face