237 Downloads Updated 1 month ago
Updated 1 month ago
1 month ago
faf1e463b11f · 55GB ·
Last updated: 2025-08-25
SEA-LION is a collection of Large Language Models (LLMs) which have been pretrained and instruct-tuned for the Southeast Asia (SEA) region.
Gemma-SEA-LION-v4-27B has undergone post-training using a QA pairs dataset in Bahasa Indonesia, Burmese, Chinese, English, Khmer, Lao, Malay, Tagalog, Tamil, Thai, and Vietnamese, comprising approximately 10M samples in total, to create Gemma-SEA-LION-v4-27B-IT.
Gemma-SEA-LION-v4-27B-IT inherits Gemma 3’s:
Large 128K context length
Image and text understanding capabilities, including document comprehension, visual Q&A, and image-grounded reasoning
Advanced function calling and structured outputs to allow for seamless integration into larger systems
SEA-LION stands for Southeast Asian Languages In One Network.
We performed post-training in English and SEA languages on Gemma-SEA-LION-v4-27B, a decoder model using the Gemma 3 architecture, to create Gemma-SEA-LION-v4-27B-IT.
For tokenization, the model employs the default tokenizer used in Gemma 3 27B IT.
As of 25 Aug 2025, Gemma-SEA-LION-v4-27B-IT excels at Southeast Asian (SEA) tasks when compared to other open models with fewer than 200 billion parameters and demonstrates performance comparable to that of larger and top closed models. For detailed rankings, please refer to the leaderboard.
For more details, please refer to AI Singapore’s HuggingFace page for this model. The original GGUF files can be obtained from this HuggingFace repository.