TowerInstruct is an Open Multilingual Large Language Model for Translation-Related Tasks by Unbabel.
2,523 Pulls Updated 6 months ago
Updated 6 months ago
6 months ago
3aa5a6387d3b · 3.8GB
Readme
TowerInstruct-7B is a language model that results from fine-tuning TowerBase on the TowerBlocks supervised fine-tuning dataset. TowerInstruct-7B-v0.1 is the first model in the series. The model is trained to handle several translation-related tasks, such as general machine translation (e.g., sentence- and paragraph/document-level translation, terminology-aware translation, context-aware translation), automatic post edition, named-entity recognition, grammatical error correction, and paraphrase generation. We will release more details in the upcoming technical report. For now, you can check the results obtained with the model here.
- Developed by: Unbabel, Instituto Superior Técnico, CentraleSupélec University of Paris-Saclay
- Model type: A 7B parameter model fine-tuned on a mix of publicly available, synthetic datasets on translation-related tasks, as well as conversational datasets and code instructions.
- Language(s) (NLP): English, Portuguese, Spanish, French, German, Dutch, Italian, Korean, Chinese, Russian
- License: CC-BY-NC-4.0, Llama 2 is licensed under the LLAMA 2 Community License, Copyright © Meta Platforms, Inc. All Rights Reserved.
- Finetuned from model: TowerBase
Update: TowerInstruct-7B-v0.2 has more reliable document-level translation capabilities in comparison with TowerInstruct-7B-v0.1. The new version of TowerBlocks used to train v0.2 is also available in the Tower collection.
Note: TowerInstruct-v0.2 was trained using the ChatML prompt templates without any system prompts.
Intended uses & limitations
The model was initially fine-tuned on a filtered and preprocessed supervised fine-tuning dataset (TowerBlocks), which contains a diverse range of data sources:
- Translation (sentence and paragraph-level)
- Automatic Post Edition
- Machine Translation Evaluation
- Context-aware Translation
- Terminology-aware Translation
- Multi-reference Translation
- Named-entity Recognition
- Paraphrase Generation
- Synthetic Chat data
- Code instructions