evilfreelancer / enbeddrus

Enbeddrus - English and Russian embedder

embedding

619 Pulls Updated 5 months ago

8 Tags

Updated 5 months ago

5 months ago

a6a5d9681f7f · 337MB

Readme

Enbeddrus - English and Russian embedder

Note: this model requires Ollama 0.1.26 or later. Download it here. It can only be used to generate embeddings.

The Enbeddrus model is designed to extract similar embeddings for comparable English and Russian phrases. It is based on the bert-base-multilingual-uncased model and was trained over 20 epochs on the following datasets:

evilfreelancer/opus-php-en-ru-cleaned (train): 1.6k lines
evilfreelancer/golang-en-ru: 554 lines
Helsinki-NLP/opus_books (en-ru, train): 17.5k lines

The goal of this model is to generate identical or very similar embeddings regardless of whether the text is written in English or Russian.

Model Versions

There is present two versions of model:

v0.1 - trained only on Russian/English Parallel Corpora, trained on PHP pairs
v0.1-domain - trained firstly on technical domain of text, then resulted model was trained on Russian/English Parallel Corpora
v0.2 - trained only on Russian/English Parallel Corpora, trained on PHP and GoLang pairs

Usage

This model is an embedding model, meaning it can only be used to generate embeddings.

REST API

curl http://localhost:11434/api/embeddings -d '{
  "model": "evilfreelancer/enbeddrus",
  "prompt": "The sky is blue because of Rayleigh scattering"
}'

Python library

ollama.embeddings(model='evilfreelancer/enbeddrus', prompt='The sky is blue because of rayleigh scattering')

Javascript library

ollama.embeddings({ model: 'evilfreelancer/enbeddrus', prompt: 'The sky is blue because of rayleigh scattering' })

Links

# Enbeddrus - English and Russian embedder

> Note: this model requires Ollama 0.1.26 or later. [Download it here.](https://ollama.com/download) It can only be used to generate embeddings.

The Enbeddrus model is designed to extract similar embeddings for comparable English and Russian phrases. It is based on the [bert-base-multilingual-uncased](https://huggingface.co/google-bert/bert-base-multilingual-cased) model and was trained over 20 epochs on the following datasets:

- [evilfreelancer/opus-php-en-ru-cleaned](https://huggingface.co/datasets/evilfreelancer/opus-php-en-ru-cleaned) (train): 1.6k lines
- [evilfreelancer/golang-en-ru](https://huggingface.co/datasets/evilfreelancer/golang-en-ru): 554 lines
- [Helsinki-NLP/opus_books](https://huggingface.co/datasets/Helsinki-NLP/opus_books/viewer/en-ru) (en-ru, train): 17.5k lines

The goal of this model is to generate identical or very similar embeddings regardless of whether the text is written in English or Russian.

## Model Versions

There is present two versions of model:

- v0.1 - trained only on Russian/English Parallel Corpora, trained on PHP pairs
- v0.1-domain - trained firstly on technical domain of text, then resulted model was trained on Russian/English Parallel Corpora
- v0.2 - trained only on Russian/English Parallel Corpora, trained on PHP and GoLang pairs

## Usage

This model is an embedding model, meaning it can only be used to generate embeddings.

**REST API**

```bash
curl http://localhost:11434/api/embeddings -d '{
  "model": "evilfreelancer/enbeddrus",
  "prompt": "The sky is blue because of Rayleigh scattering"
}'
```

**Python library**

```python
ollama.embeddings(model='evilfreelancer/enbeddrus', prompt='The sky is blue because of rayleigh scattering')
```

**Javascript library**

```javascript
ollama.embeddings({ model: 'evilfreelancer/enbeddrus', prompt: 'The sky is blue because of rayleigh scattering' })
```

## Links

- https://github.com/EvilFreelancer/enbeddrus
- https://huggingface.co/evilfreelancer/enbeddrus-v0.1
- https://huggingface.co/evilfreelancer/enbeddrus-v0.1-domain

Paste, drop or click to upload images (.png, .jpeg, .jpg, .svg, .gif)