110 6 months ago

A Sinhala-adapted version of Google’s Gemma 3 4B, continually pre-trained on 10.7M Sinhala sentences with a custom 16k vocabulary using 4-bit LoRA.

ollama run Tharusha_Dilhara_Jayadeera/singemma

Details

6 months ago

4afed2af5ab5 · 2.5GB ·

gemma3
·
3.88B
·
Q4_K_M
<start_of_turn>user {{ .Prompt }}<end_of_turn> <start_of_turn>model
ඔබ සිංහල භාෂාවෙන් චතුර ලෙස පිළිතුරු ස
{ "stop": [ "<start_of_turn>", "<end_of_turn>" ] }

Readme

singemma cover.png

SinGemma-Sinhala-4B-v1

SinGemma-Sinhala-4B-v1 is a Sinhala-language causal LLM built on Google’s Gemma 3-4B base, pretrained further on Sinhala data to improve fluency, coherence, and usability in Sinhala NLP tasks.


Model Details

Feature Description
Model Name SinGemma-Sinhala-4B-v1
Base Architecture Google Gemma 3, approx. 4B parameters
Language Sinhala
Tokenizer / Vocabulary Extended ~16,000 token vocabulary optimized for Sinhala
Precision / Format — uses safetensors; supports BF16 or quantized formats (if converted)

Intended Use Cases

  • Conversational agents / chatbots in Sinhala
  • Text generation: stories, essays, summaries in Sinhala
  • Completion or assistance in Sinhala writing
  • Research in Sinhala natural language processing