The EuroLLM project has the goal of creating a suite of LLMs capable of understanding and generating text in all European Union languages as well as some additional relevant languages.

488 3 months ago

Readme

EuroLLM-9B-Instruct on Ollama

Overview

EuroLLM-9B-Instruct is part of the EuroLLM project, which aims to create a suite of large language models (LLMs) capable of understanding and generating text in all European Union languages, as well as additional relevant languages. This model has been fine-tuned for instruction-following tasks, making it highly effective for multilingual and machine translation use cases.

Features

  • Multilingual Support: Capable of processing 35+ languages, including all European Union languages and others such as Arabic, Chinese, Hindi, Japanese, Korean, Russian, and Turkish.
  • Instruction Fine-Tuning: Trained on the EuroBlocks dataset, focused on general instruction-following and machine translation tasks.
  • State-of-the-Art Performance: Achieves competitive results on multilingual benchmarks and outperforms many European-developed models.

Technical Details

  • Architecture: Dense Transformer with 9.15 billion parameters.
  • Optimization:
    • Grouped Query Attention (GQA): Enhances inference speed while maintaining performance.
    • Pre-layer Normalization: Improves training stability.
    • RMSNorm: Fast and efficient normalization.
    • SwiGLU Activation: Delivers high performance on downstream tasks.
    • Rotary Positional Embeddings (RoPE): Improves context handling and enables extended context length.
  • Quantization: Uses Q4_K_M quantization for efficient performance on limited hardware.

Model Hyperparameters

Attribute Value
Sequence Length 4,096 tokens
Number of Layers 42
Embedding Size 4,096
Hidden Size (FFN) 12,288
Attention Heads 32
Key-Value Heads (GQA) 8
Activation Function SwiGLU
Positional Encoding RoPE
Parameters 9.15B

Performance Highlights

EuroLLM-9B-Instruct excels in: - Multilingual Benchmarks: Achieves top rankings on tasks such as MMLU-Pro and MUSR, demonstrating strong understanding across multiple languages. - English Benchmarks: Matches or exceeds the performance of leading models like Mistral-7B in English-language tasks. - Instruction Following: Fine-tuned for understanding and responding to complex instructions in diverse contexts.

Quantization

The model is quantized using Q4_K_M, significantly reducing memory requirements while maintaining high performance. This makes it accessible for deployment on machines with limited hardware resources. The quantized model has a file size of approximately 5.6GB.

Applications

EuroLLM-9B-Instruct can be utilized across a wide range of applications: - Machine Translation: Accurate and reliable translations between supported languages. - Content Generation: Generating creative text in various formats such as blogs, articles, or stories in multiple languages. - Multilingual Support Systems: Enhancing customer support by responding to queries in users’ native languages. - Language Learning Tools: Assisting learners with translation, grammar correction, and language practice. - Cultural Research and Analysis: Providing contextual insights and descriptions about different cultures, languages, and historical topics.

Model Details

EuroLLM-9B-Instruct was developed through a collaborative effort by leading research institutions and organizations: - Developers: Unbabel, Instituto Superior Técnico, Instituto de Telecomunicações, University of Edinburgh, Aveni, University of Paris-Saclay, University of Amsterdam, Naver Labs, Sorbonne Université. - Funded By: The European Union.

Model Specifications

Attribute Details
Parameters 9.15 billion
Languages Supported 35+
Sequence Length 4,096 tokens
Number of Layers 42
Embedding Size 4,096
Attention Heads 32
Optimization Adam with BF16
Hardware 400 NVIDIA H100 GPUs

Known Limitations

While EuroLLM-9B-Instruct is highly capable, it has certain limitations: - Bias and Risks: As it has not been aligned to human preferences, the model may occasionally produce biased or inappropriate outputs. - Hallucinations: The model might generate factually incorrect or unverifiable information. - Low-Resource Languages: Performance may vary for languages with less representation in the training data.

License

This model is released under the Apache License 2.0, permitting use for both academic and commercial purposes. Ensure compliance with the license terms when using the model.

Support and Feedback

If you encounter issues or have suggestions, please reach out via: - Ollama GitHub Discussions - Hugging Face Community


Thank you for using EuroLLM-9B-Instruct! Your feedback and support contribute to advancing multilingual AI capabilities.