Enbeddrus - English and Russian embedder

embedding

619 5 months ago

Readme

Enbeddrus - English and Russian embedder

Note: this model requires Ollama 0.1.26 or later. Download it here. It can only be used to generate embeddings.

The Enbeddrus model is designed to extract similar embeddings for comparable English and Russian phrases. It is based on the bert-base-multilingual-uncased model and was trained over 20 epochs on the following datasets:

The goal of this model is to generate identical or very similar embeddings regardless of whether the text is written in English or Russian.

Model Versions

There is present two versions of model:

  • v0.1 - trained only on Russian/English Parallel Corpora, trained on PHP pairs
  • v0.1-domain - trained firstly on technical domain of text, then resulted model was trained on Russian/English Parallel Corpora
  • v0.2 - trained only on Russian/English Parallel Corpora, trained on PHP and GoLang pairs

Usage

This model is an embedding model, meaning it can only be used to generate embeddings.

REST API

curl http://localhost:11434/api/embeddings -d '{
  "model": "evilfreelancer/enbeddrus",
  "prompt": "The sky is blue because of Rayleigh scattering"
}'

Python library

ollama.embeddings(model='evilfreelancer/enbeddrus', prompt='The sky is blue because of rayleigh scattering')

Javascript library

ollama.embeddings({ model: 'evilfreelancer/enbeddrus', prompt: 'The sky is blue because of rayleigh scattering' })

Links