Enbeddrus - English and Russian embedder
618 Pulls Updated 5 months ago
Updated 5 months ago
5 months ago
a6a5d9681f7f · 337MB
Readme
Enbeddrus - English and Russian embedder
Note: this model requires Ollama 0.1.26 or later. Download it here. It can only be used to generate embeddings.
The Enbeddrus model is designed to extract similar embeddings for comparable English and Russian phrases. It is based on the bert-base-multilingual-uncased model and was trained over 20 epochs on the following datasets:
- evilfreelancer/opus-php-en-ru-cleaned (train): 1.6k lines
- evilfreelancer/golang-en-ru: 554 lines
- Helsinki-NLP/opus_books (en-ru, train): 17.5k lines
The goal of this model is to generate identical or very similar embeddings regardless of whether the text is written in English or Russian.
Model Versions
There is present two versions of model:
- v0.1 - trained only on Russian/English Parallel Corpora, trained on PHP pairs
- v0.1-domain - trained firstly on technical domain of text, then resulted model was trained on Russian/English Parallel Corpora
- v0.2 - trained only on Russian/English Parallel Corpora, trained on PHP and GoLang pairs
Usage
This model is an embedding model, meaning it can only be used to generate embeddings.
REST API
curl http://localhost:11434/api/embeddings -d '{
"model": "evilfreelancer/enbeddrus",
"prompt": "The sky is blue because of Rayleigh scattering"
}'
Python library
ollama.embeddings(model='evilfreelancer/enbeddrus', prompt='The sky is blue because of rayleigh scattering')
Javascript library
ollama.embeddings({ model: 'evilfreelancer/enbeddrus', prompt: 'The sky is blue because of rayleigh scattering' })