mxbai-embed-large:v1

mxbai-embed-large:v1

6.9M Downloads Updated 1 year ago

State-of-the-art large embedding model from mixedbread.ai

embedding 335m

ollama pull mxbai-embed-large:v1

curl http://localhost:11434/api/embed \
  -d '{
    "model": "mxbai-embed-large:v1",
    "input": "Why is the sky blue?"
  }'

import ollama

response = ollama.embed(
    model='mxbai-embed-large:v1',
    input='The sky is blue because of Rayleigh scattering',
)
print(response.embeddings)

import ollama from 'ollama'

const response = await ollama.embed({
  model: 'mxbai-embed-large:v1',
  input: 'The sky is blue because of Rayleigh scattering',
})
console.log(response.embeddings)

Details

Updated 1 year ago

1 year ago

468836162de7 · 670MB ·

model

archbert

·

parameters334M

·

quantizationF16

670MB

license

Apache License Version 2.0, January 2004 http://www.apache.org/licenses/ TERMS AND CONDITIONS FOR US

11kB

params

{ "num_ctx": 512 }

16B

Readme

mxbai-embed-large

As of March 2024, this model archives SOTA performance for Bert-large sized models on the MTEB. It outperforms commercial models like OpenAIs text-embedding-3-large model and matches the performance of model 20x its size.

mxbai-embed-large was trained with no overlap of the MTEB data, which indicates that the model generalizes well across several domains, tasks and text length.

Usage

REST API

curl http://localhost:11434/api/embeddings -d '{
  "model": "mxbai-embed-large",
  "prompt": "Represent this sentence for searching relevant passages: The sky is blue because of Rayleigh scattering"
}'

Python library

ollama.embeddings(model='mxbai-embed-large', prompt='Represent this sentence for searching relevant passages: The sky is blue because of Rayleigh scattering')

Javascript library

ollama.embeddings({ model: 'mxbai-embed-large', prompt: 'Represent this sentence for searching relevant passages:  The sky is blue because of Rayleigh scattering' })

References

## mxbai-embed-large

<img src="https://github.com/ollama/ollama/assets/251292/215cfb6a-8efa-4e9b-824d-e5f466b58c49" widht="400">

As of March 2024, this model archives SOTA performance for Bert-large sized models on the MTEB. It outperforms commercial models like OpenAIs `text-embedding-3-large` model and matches the performance of model 20x its size.

`mxbai-embed-large` was trained with no overlap of the MTEB data, which indicates that the model generalizes well across several domains, tasks and text length.

## Usage

### REST API

```
curl http://localhost:11434/api/embeddings -d '{
  "model": "mxbai-embed-large",
  "prompt": "Represent this sentence for searching relevant passages: The sky is blue because of Rayleigh scattering"
}'
```

### Python library

```
ollama.embeddings(model='mxbai-embed-large', prompt='Represent this sentence for searching relevant passages: The sky is blue because of Rayleigh scattering')
```

### Javascript library

```
ollama.embeddings({ model: 'mxbai-embed-large', prompt: 'Represent this sentence for searching relevant passages:  The sky is blue because of Rayleigh scattering' })
```

## References

[Blog post](https://www.mixedbread.ai/blog/mxbai-embed-large-v1)

[Hugging Face](https://huggingface.co/mixedbread-ai/mxbai-embed-large-v1)

Paste, drop or click to upload images (.png, .jpeg, .jpg, .svg, .gif)