30 Downloads Updated 1 year ago
Models from Taiwanese Hokkien LLM
Credits go to Bohanlu
The Taigi-Llama-2-Chat series are built based on the Taigi-Llama-2 series model. We first create the chat vector from the original meta-Llama-2 model using the method introduced in Chat Vector: A Simple Approach to Equip LLMs with Instruction Following and Model Alignment in New Languages to enable the ability to avoid toxic replies. We then conducted instruction fine-tuning on around 100k instruction tuning dataset in Taiwanese Hokkien Hanzi to obtain a chat model specifically for Taiwanese Hokkien. The dataset was generated from Traditional Chinese instruction tuning datasets using the translation model Bohanlu/Taigi-Llama-2-Translator-13B.
For more details, please refer to our GitHub repository.
Explore other models and datasets in the Taiwanese Hokkien LLM collection.