110 6 months ago

A Sinhala-adapted version of Google’s Gemma 3 4B, continually pre-trained on 10.7M Sinhala sentences with a custom 16k vocabulary using 4-bit LoRA.

ollama run Tharusha_Dilhara_Jayadeera/singemma

Models

View all →

Readme

singemma cover.png

SinGemma-Sinhala-4B-v1

SinGemma-Sinhala-4B-v1 is a Sinhala-language causal LLM built on Google’s Gemma 3-4B base, pretrained further on Sinhala data to improve fluency, coherence, and usability in Sinhala NLP tasks.


Model Details

Feature Description
Model Name SinGemma-Sinhala-4B-v1
Base Architecture Google Gemma 3, approx. 4B parameters
Language Sinhala
Tokenizer / Vocabulary Extended ~16,000 token vocabulary optimized for Sinhala
Precision / Format — uses safetensors; supports BF16 or quantized formats (if converted)

Intended Use Cases

  • Conversational agents / chatbots in Sinhala
  • Text generation: stories, essays, summaries in Sinhala
  • Completion or assistance in Sinhala writing
  • Research in Sinhala natural language processing