233 3 months ago

This model is an Ollama version of heydariAI-persian-embeddings from Hugging Face. This is fine-tuned version of xlm-roberta-base, specifically trained on a massive corpus of Persian data to create high-quality contextual embedding for Persian usages.

embedding

Models

View all →

Readme

Persian Embeddings for Ollama

This repository hosts a Persian (Farsi) sentence embedding model, converted from the original heydariAI/persian-embeddings Hugging Face model into GGUF format and ported to Ollama for easy local use.

With this model, you can generate high-quality vector embeddings for Persian text — useful for semantic search, clustering, classification, recommendation systems, and more.


✨ Features

  • 📌 Pretrained on large-scale Persian corpora
  • 📌 Optimized for sentence-level embeddings
  • 📌 Runs locally with Ollama (no external API needed)
  • 📌 Available in GGUF format for fast inference

🚀 Quick Start

  1. Pull the model from Ollama:
   ollama pull aligh4699/heydariAI-persian-embeddings
  1. Use the embeddings for:

    • Semantic search
    • Text similarity
    • Clustering
    • Downstream ML/NLP tasks

📖 Example: Semantic Similarity

import ollama
from numpy import dot
import numpy as np
from numpy.linalg import norm

def calcualte_cosine_sim(emb_a, emb_b):
    return dot(emb_a, emb_b) / (norm(emb_a) * norm(emb_b))

if __name__ == "__main__":
    # Get embeddings
    emb1 = np.array(ollama.embed(model="aligh4699/heydariAI-persian-embeddings", input="سلام دنیا. صبح بسیار زیبایی است.")["embeddings"][0])
    emb2 = np.array(ollama.embed(model="aligh4699/heydariAI-persian-embeddings", input="درود جهان. صبحتان پرطراوت باشد.")["embeddings"][0])
    emb3 = np.array(ollama.embed(model="aligh4699/heydariAI-persian-embeddings", input="خداحاظ تا ابد. من از امروز متنفرم.")["embeddings"][0])

    # Cosine similarity
    similarity1 = calcualte_cosine_sim(emb1, emb2)
    similarity2 = calcualte_cosine_sim(emb1, emb3)
    similarity3 = calcualte_cosine_sim(emb2, emb3)

    print("Cosine similarity (1,2): ", similarity1)
    print("Cosine similarity (1,3): ", similarity2)
    print("Cosine similarity (2,3): ", similarity3)


Cosine similarity (1,2):  0.8702924086504231
Cosine similarity (1,3):  0.38726373879395914
Cosine similarity (2,3):  0.46212684228451056


⚖️ License & Credits

  • Original model: heydariAI/persian-embeddings
  • Converted to GGUF and ported for Ollama by [Ali Reza Ghasemi / aligh4699]
  • License: Please refer to the original model’s license before using in commercial projects

❤️ Acknowledgments

Thanks to:

  • HeydariAI for creating the original Persian embeddings model
  • Ollama for enabling easy local model deployment

🔗 Useful Links