evilfreelancer/ enbeddrus

2,454 Downloads Updated 2 years ago

Enbeddrus - English and Russian embedding model

embedding

ollama pull evilfreelancer/enbeddrus

curl http://localhost:11434/api/embed \
  -d '{
    "model": "evilfreelancer/enbeddrus",
    "input": "Why is the sky blue?"
  }'

import ollama

response = ollama.embed(
    model='evilfreelancer/enbeddrus',
    input='The sky is blue because of Rayleigh scattering',
)
print(response.embeddings)

import ollama from 'ollama'

const response = await ollama.embed({
  model: 'evilfreelancer/enbeddrus',
  input: 'The sky is blue because of Rayleigh scattering',
})
console.log(response.embeddings)

Models

Name

8 models

Size / Usage

Context

Input

enbeddrus:latest

337MB · 512 context window · Text · 2 years ago

enbeddrus:latest

337MB

512

Text

enbeddrus:v0.1

337MB · 512 context window · Text · 2 years ago

enbeddrus:v0.1

337MB

512

Text

enbeddrus:v0.2

337MB · 512 context window · Text · 2 years ago

enbeddrus:v0.2 latest

337MB

512

Text

enbeddrus:v0.1-domain

337MB · 512 context window · Text · 2 years ago

enbeddrus:v0.1-domain

337MB

512

Text

enbeddrus:v0.1-domain-fp16

337MB · 512 context window · Text · 2 years ago

enbeddrus:v0.1-domain-fp16

337MB

512

Text

enbeddrus:v0.1-fp16

337MB · 512 context window · Text · 2 years ago

enbeddrus:v0.1-fp16

337MB

512

Text

enbeddrus:v0.2-q8_0

181MB · 512 context window · Text · 2 years ago

enbeddrus:v0.2-q8_0

181MB

512

Text

enbeddrus:v0.2-fp16

337MB · 512 context window · Text · 2 years ago

enbeddrus:v0.2-fp16

337MB

512

Text

Readme

Enbeddrus - English and Russian embedder

Note: this model requires Ollama 0.1.26 or later. Download it here. It can only be used to generate embeddings.

The Enbeddrus is embedding model designed to extract similar embeddings for comparable English and Russian phrases. It is based on the bert-base-multilingual-uncased model and was trained over 20 epochs on the following datasets:

evilfreelancer/opus-php-en-ru-cleaned (train): 1.6k lines
evilfreelancer/golang-en-ru (train): 554 lines
Helsinki-NLP/opus_books (en-ru, train): 17.5k lines

The goal of this model is to generate identical or very similar embeddings regardless of whether the text is written in English or Russian.

Model Versions

There is present two versions of model:

v0.1 - trained only on Russian/English Parallel Corpora, trained on PHP pairs
v0.1-domain - trained firstly on technical domain of text, then resulted model was trained on Russian/English Parallel Corpora
v0.2 - trained only on Russian/English Parallel Corpora, trained on PHP and GoLang pairs

Usage

This model is an embedding model, meaning it can only be used to generate embeddings.

REST API

curl http://localhost:11434/api/embeddings -d '{
  "model": "evilfreelancer/enbeddrus",
  "prompt": "The sky is blue because of Rayleigh scattering"
}'

Python library

ollama.embeddings(model='evilfreelancer/enbeddrus', prompt='The sky is blue because of rayleigh scattering')

Javascript library

ollama.embeddings({ model: 'evilfreelancer/enbeddrus', prompt: 'The sky is blue because of rayleigh scattering' })

Links

# Enbeddrus - English and Russian embedder

> Note: this model requires Ollama 0.1.26 or later. [Download it here.](https://ollama.com/download) It can only be used to generate embeddings.

The Enbeddrus is embedding model designed to extract similar embeddings for comparable English and Russian phrases. It is based on the [bert-base-multilingual-uncased](https://huggingface.co/google-bert/bert-base-multilingual-cased) model and was trained over 20 epochs on the following datasets:

- [evilfreelancer/opus-php-en-ru-cleaned](https://huggingface.co/datasets/evilfreelancer/opus-php-en-ru-cleaned) (train): 1.6k lines
- [evilfreelancer/golang-en-ru](https://huggingface.co/datasets/evilfreelancer/golang-en-ru) (train): 554 lines
- [Helsinki-NLP/opus_books](https://huggingface.co/datasets/Helsinki-NLP/opus_books/viewer/en-ru) (en-ru, train): 17.5k lines

The goal of this model is to generate identical or very similar embeddings regardless of whether the text is written in English or Russian.

## Model Versions

There is present two versions of model:

- v0.1 - trained only on Russian/English Parallel Corpora, trained on PHP pairs
- v0.1-domain - trained firstly on technical domain of text, then resulted model was trained on Russian/English Parallel Corpora
- v0.2 - trained only on Russian/English Parallel Corpora, trained on PHP and GoLang pairs

## Usage

This model is an embedding model, meaning it can only be used to generate embeddings.

**REST API**

```bash
curl http://localhost:11434/api/embeddings -d '{
  "model": "evilfreelancer/enbeddrus",
  "prompt": "The sky is blue because of Rayleigh scattering"
}'
```

**Python library**

```python
ollama.embeddings(model='evilfreelancer/enbeddrus', prompt='The sky is blue because of rayleigh scattering')
```

**Javascript library**

```javascript
ollama.embeddings({ model: 'evilfreelancer/enbeddrus', prompt: 'The sky is blue because of rayleigh scattering' })
```

## Links

- https://github.com/EvilFreelancer/enbeddrus
- https://huggingface.co/evilfreelancer/enbeddrus-v0.1
- https://huggingface.co/evilfreelancer/enbeddrus-v0.1-domain

Paste, drop or click to upload images (.png, .jpeg, .jpg, .svg, .gif)