A 4B-parameter Persian-specialized language model built on Google's Gemma 3 architecture, fine-tuned on high-quality Persian text data while preserving multimodal capabilities for native-quality responses.

🌟 Gemma 3 Persian (v0)

gemma-3-persian is a Persian-specialized language model built on Google’s Gemma 3 architecture. This model has been fine-tuned on high-quality Persian text data to provide native-quality responses for Persian speakers while maintaining the multimodal capabilities of the base model.

The model uses QLoRA with 4-bit quantization to optimize for performance on consumer hardware while preserving the quality of responses in Persian.

🇮🇷 این مدل برای زبان فارسی بهینه‌سازی شده و می‌تواند به سوالات شما به صورت طبیعی پاسخ دهد.

⚡ Quick Start

Installation

First, ensure Ollama is installed on your system:

Linux/macOS:

curl -fsSL https://ollama.ai/install.sh | sh

Windows: Download from the official website

Pull the Model

ollama pull mshojaei77/gemma3persian

Run the Model

ollama run mshojaei77/gemma3persian

💬 Example Usage

Basic Conversation

> سلام، می‌توانی درباره تاریخ ایران به من اطلاعاتی بدهی؟
(Hello, can you give me information about the history of Iran?)

> می‌توانی یک شعر کوتاه برای من بنویسی؟
(Can you write a short poem for me?)

> این تصویر را توصیف کن: [IMAGE]
(Describe this image:)

Advanced Parameters

Run with custom parameters:

ollama run mshojaei77/gemma3persian \
  --temperature 0.7 \
  --top_p 0.9 \
  --context_length 8192

🖥️ Programmatic Usage

Python with Ollama Library

Integrate the model directly in your Python applications:

from ollama import chat

# Initialize chat with the Persian Gemma model
response = chat(model='mshojaei77/gemma3persian', messages=[
  {
    'role': 'user',
    'content': 'سلام، می‌توانی خودت را معرفی کنی؟', # "Hello, can you introduce yourself?"
  },
])

# Print the model's response
print(response['message']['content'])
# Or access fields directly from the response object
print(response.message.content)

JavaScript/Node.js with Ollama

Use the model in your JavaScript or TypeScript applications:

import ollama from 'ollama'

async function chatWithGemmaPersian() {
  const response = await ollama.chat({
    model: 'mshojaei77/gemma3persian',
    messages: [{ 
      role: 'user', 
      content: 'لطفاً یک داستان کوتاه بنویس.' // "Please write a short story."
    }],
  })
  console.log(response.message.content)
}

chatWithGemmaPersian()

Python with LangChain Integration

Create more complex applications using LangChain’s conversational memory:

from langchain_ollama import ChatOllama
from langchain.chains import ConversationChain
from langchain.memory import ConversationBufferMemory

# Set up the Gemma Persian model
llm = ChatOllama(
    model="mshojaei77/gemma3persian",
    temperature=0.7,
    num_predict=256
)

# Add a memory component for contextual conversations
memory = ConversationBufferMemory(return_messages=True)

# Create a conversation chain with memory
conversation = ConversationChain(llm=llm, memory=memory)

# Start chatting with memory in Persian
print(conversation.run(input="سلام، حالت چطور است؟"))  # "Hello, how are you?"
print(conversation.run(input="من درباره تاریخ ایران کنجکاو هستم."))  # "I'm curious about the history of Iran."
print(conversation.run(input="می‌توانی آخرین سوال من را یادآوری کنی؟"))  # "Can you remind me of my last question?"

🔍 Capabilities

Feature	Support	Notes
🇮🇷 Persian text generation	✅ Excellent	Optimized for natural Persian language
🖼️ Image understanding	✅ Supported	Inherited from base Gemma 3 model
🎯 Instruction following	✅ Strong	Fine-tuned on instruction datasets
💭 Creative writing	✅ Good	Poetry, stories, and creative content
🧠 Knowledge retrieval	✅ Basic	Limited to training data
💻 Code generation	⚠️ Limited	Better in English than Persian

🔧 Technical Details

Base Model: Google Gemma 3-4B
Training Dataset: mshojaei77/Persian_sft (681,000+ Persian texts)
Fine-Tuning: QLoRA with 4-bit quantization
Hardware Used: T4 GPU
Context Length: 8,192 tokens
Libraries: Hugging Face Transformers, PEFT, bitsandbytes

📊 Hardware Requirements

Hardware	Minimum	Recommended
RAM	8GB	16GB+
GPU VRAM	4GB	8GB+
Disk	4GB free	10GB+ free

⚠️ Limitations

4-bit quantization may occasionally reduce precision in complex reasoning
Limited by the training data available in Persian
May generate plausible but incorrect information
Not extensively safety-tuned for all scenarios
Knowledge cutoff from the base model training

🌐 Community & Support

Report issues on Hugging Face

📜 License

This model is subject to the Gemma license from Google.