4 Downloads Updated 1 week ago
Last update: 2025-10-16
SEA-LION is a collection of Large Language Models (LLMs) which have been pretrained and instruct-tuned for the Southeast Asia (SEA) region.
Qwen-SEA-LION-v4-32B-IT is based on Qwen3, which provides a strong foundation with support for over 100 languages and advanced reasoning capabilities. The model underwent continued pre-training on approximately 100B tokens sampled from the SEA-Pile v2 pretraining corpus of over one trillion tokens across 7 SEA languages: Burmese, Indonesian, Malay, Filipino, Tamil, Thai, and Vietnamese. Finally, it was post-trained on a high-quality dataset of approximately 8 million question-and-answer pairs to create the final instruction-tuned model.
Qwen-SEA-LION-v4-32B-IT inherits the following features from Qwen3-32B:
SEA-LION stands for Southeast Asian Languages In One Network.
We performed continued pre-training in English and SEA languages on Qwen3-32B, a decoder model using the Qwen 3 architecture, and post-training to create Qwen-SEA-LION-v4-32B-IT.
For tokenization, the model employs the default tokenizer used in Qwen3-32B.
For details on Qwen-SEA-LION-v4-32B-IT performance, please refer to the SEA-HELM leaderboard, https://leaderboard.sea-lion.ai/ . For more details, please refer to AI Singapore’s HuggingFace page for this model.