This model is an Ollama version of heydariAI-persian-embeddings from Hugging Face. This is fine-tuned version of xlm-roberta-base, specifically trained on a massive corpus of Persian data to create high-quality contextual embedding for Persian usages.

Persian Embeddings for Ollama

This repository hosts a Persian (Farsi) sentence embedding model, converted from the original heydariAI/persian-embeddings Hugging Face model into GGUF format and ported to Ollama for easy local use.

With this model, you can generate high-quality vector embeddings for Persian text — useful for semantic search, clustering, classification, recommendation systems, and more.

✨ Features

📌 Pretrained on large-scale Persian corpora
📌 Optimized for sentence-level embeddings
📌 Runs locally with Ollama (no external API needed)
📌 Available in GGUF format for fast inference

🚀 Quick Start

Pull the model from Ollama:

   ollama pull aligh4699/heydariAI-persian-embeddings

Use the embeddings for:
- Semantic search
- Text similarity
- Clustering
- Downstream ML/NLP tasks

📖 Example: Semantic Similarity

import ollama
from numpy import dot
import numpy as np
from numpy.linalg import norm

def calcualte_cosine_sim(emb_a, emb_b):
    return dot(emb_a, emb_b) / (norm(emb_a) * norm(emb_b))

if __name__ == "__main__":
    # Get embeddings
    emb1 = np.array(ollama.embed(model="aligh4699/heydariAI-persian-embeddings", input="سلام دنیا. صبح بسیار زیبایی است.")["embeddings"][0])
    emb2 = np.array(ollama.embed(model="aligh4699/heydariAI-persian-embeddings", input="درود جهان. صبحتان پرطراوت باشد.")["embeddings"][0])
    emb3 = np.array(ollama.embed(model="aligh4699/heydariAI-persian-embeddings", input="خداحاظ تا ابد. من از امروز متنفرم.")["embeddings"][0])

    # Cosine similarity
    similarity1 = calcualte_cosine_sim(emb1, emb2)
    similarity2 = calcualte_cosine_sim(emb1, emb3)
    similarity3 = calcualte_cosine_sim(emb2, emb3)

    print("Cosine similarity (1,2): ", similarity1)
    print("Cosine similarity (1,3): ", similarity2)
    print("Cosine similarity (2,3): ", similarity3)

‍


Cosine similarity (1,2):  0.8702924086504231
Cosine similarity (1,3):  0.38726373879395914
Cosine similarity (2,3):  0.46212684228451056

⚖️ License & Credits

Original model: heydariAI/persian-embeddings
Converted to GGUF and ported for Ollama by [Ali Reza Ghasemi / aligh4699]
License: Please refer to the original model’s license before using in commercial projects

❤️ Acknowledgments

Thanks to:

HeydariAI for creating the original Persian embeddings model
Ollama for enabling easy local model deployment