The EuroLLM project has the goal of creating a suite of LLMs capable of understanding and generating text in all European Union languages as well as some additional relevant languages.
488 Pulls Updated 3 months ago
Updated 3 months ago
3 months ago
9db8681f55b6 · 5.6GB
Readme
EuroLLM-9B-Instruct on Ollama
Overview
EuroLLM-9B-Instruct is part of the EuroLLM project, which aims to create a suite of large language models (LLMs) capable of understanding and generating text in all European Union languages, as well as additional relevant languages. This model has been fine-tuned for instruction-following tasks, making it highly effective for multilingual and machine translation use cases.
Features
- Multilingual Support: Capable of processing 35+ languages, including all European Union languages and others such as Arabic, Chinese, Hindi, Japanese, Korean, Russian, and Turkish.
- Instruction Fine-Tuning: Trained on the EuroBlocks dataset, focused on general instruction-following and machine translation tasks.
- State-of-the-Art Performance: Achieves competitive results on multilingual benchmarks and outperforms many European-developed models.
Technical Details
- Architecture: Dense Transformer with 9.15 billion parameters.
- Optimization:
- Grouped Query Attention (GQA): Enhances inference speed while maintaining performance.
- Pre-layer Normalization: Improves training stability.
- RMSNorm: Fast and efficient normalization.
- SwiGLU Activation: Delivers high performance on downstream tasks.
- Rotary Positional Embeddings (RoPE): Improves context handling and enables extended context length.
- Quantization: Uses
Q4_K_M
quantization for efficient performance on limited hardware.
Model Hyperparameters
Attribute | Value |
---|---|
Sequence Length | 4,096 tokens |
Number of Layers | 42 |
Embedding Size | 4,096 |
Hidden Size (FFN) | 12,288 |
Attention Heads | 32 |
Key-Value Heads (GQA) | 8 |
Activation Function | SwiGLU |
Positional Encoding | RoPE |
Parameters | 9.15B |
Performance Highlights
EuroLLM-9B-Instruct excels in: - Multilingual Benchmarks: Achieves top rankings on tasks such as MMLU-Pro and MUSR, demonstrating strong understanding across multiple languages. - English Benchmarks: Matches or exceeds the performance of leading models like Mistral-7B in English-language tasks. - Instruction Following: Fine-tuned for understanding and responding to complex instructions in diverse contexts.
Quantization
The model is quantized using Q4_K_M
, significantly reducing memory requirements while maintaining high performance. This makes it accessible for deployment on machines with limited hardware resources. The quantized model has a file size of approximately 5.6GB.
Applications
EuroLLM-9B-Instruct can be utilized across a wide range of applications: - Machine Translation: Accurate and reliable translations between supported languages. - Content Generation: Generating creative text in various formats such as blogs, articles, or stories in multiple languages. - Multilingual Support Systems: Enhancing customer support by responding to queries in users’ native languages. - Language Learning Tools: Assisting learners with translation, grammar correction, and language practice. - Cultural Research and Analysis: Providing contextual insights and descriptions about different cultures, languages, and historical topics.
Model Details
EuroLLM-9B-Instruct was developed through a collaborative effort by leading research institutions and organizations: - Developers: Unbabel, Instituto Superior Técnico, Instituto de Telecomunicações, University of Edinburgh, Aveni, University of Paris-Saclay, University of Amsterdam, Naver Labs, Sorbonne Université. - Funded By: The European Union.
Model Specifications
Attribute | Details |
---|---|
Parameters | 9.15 billion |
Languages Supported | 35+ |
Sequence Length | 4,096 tokens |
Number of Layers | 42 |
Embedding Size | 4,096 |
Attention Heads | 32 |
Optimization | Adam with BF16 |
Hardware | 400 NVIDIA H100 GPUs |
Known Limitations
While EuroLLM-9B-Instruct is highly capable, it has certain limitations: - Bias and Risks: As it has not been aligned to human preferences, the model may occasionally produce biased or inappropriate outputs. - Hallucinations: The model might generate factually incorrect or unverifiable information. - Low-Resource Languages: Performance may vary for languages with less representation in the training data.
License
This model is released under the Apache License 2.0, permitting use for both academic and commercial purposes. Ensure compliance with the license terms when using the model.
Support and Feedback
If you encounter issues or have suggestions, please reach out via: - Ollama GitHub Discussions - Hugging Face Community
Thank you for using EuroLLM-9B-Instruct! Your feedback and support contribute to advancing multilingual AI capabilities.