kollcn / llama3-8b-chinese-chat-f16-v2

Model Summary
Llama3-8B-Chinese-Chat is an instruction-tuned language model for Chinese & English users with various abilities such as roleplaying & tool-using built upon the Meta-Llama-3-8B-Instruct model.

Developed by: Shenzhi Wang (王慎执) and Yaowei Zheng (郑耀威)

License: Llama-3 License
Base Model: Meta-Llama-3-8B-Instruct
Model Size: 8.03B
Context length: 8K
1. Introduction
❗️❗️❗️NOTICE: The main branch contains the f16 GGUF version of Llama3-8B-Chinese-Chat-v2, if you want to use our f16 GGUF version of Llama3-8B-Chinese-Chat-v1, please refer to the v1 branch.

This is the official f16 GGUF version of Llama3-8B-Chinese-Chat-v2, which is the first Chinese chat model specifically fine-tuned for Chinese through ORPO [1] based on the Meta-Llama-3-8B-Instruct model.

Compared to the original Meta-Llama-3-8B-Instruct model, our Llama3-8B-Chinese-Chat model significantly reduces the issues of “Chinese questions with English answers” and the mixing of Chinese and English in responses. Additionally, compared to the original model, our model greatly reduces the number of emojis in the answers, making the responses more formal.

Compared to Llama3-8B-Chinese-Chat-v1, our Llama3-8B-Chinese-Chat-v2 model significantly increases the training data size (from 20K to 100K), which introduces great performance enhancement, especially in roleplay, function calling, and math.

[1] Hong, Jiwoo, Noah Lee, and James Thorne. “Reference-free Monolithic Preference Optimization with Odds Ratio.” arXiv preprint arXiv:2403.07691 (2024).

Training framework: LLaMA-Factory (commit id: 32347901d4af94ccd72b3c7e1afaaceb5cb3d26a).

Training details:

epochs: 3
learning rate: 5e-6
learning rate scheduler type: cosine
Warmup ratio: 0.1
cutoff len (i.e. context length): 8192
orpo beta (i.e. $\lambda$ in the ORPO paper): 0.05
global batch size: 64
fine-tuning type: full parameters
optimizer: paged_adamw_32bit
To reproduce the model

Developed by: Shenzhi Wang (王慎执) and Yaowei Zheng (郑耀威)

Compared to the original Meta-Llama-3-8B-Instruct model, our Llama3-8B-Chinese-Chat model significantly reduces the issues of "Chinese questions with English answers" and the mixing of Chinese and English in responses. Additionally, compared to the original model, our model greatly reduces the number of emojis in the answers, making the responses more formal.

[1] Hong, Jiwoo, Noah Lee, and James Thorne. "Reference-free Monolithic Preference Optimization with Odds Ratio." arXiv preprint arXiv:2403.07691 (2024).

Training framework: LLaMA-Factory (commit id: 32347901d4af94ccd72b3c7e1afaaceb5cb3d26a).

Training details:

epochs: 3
learning rate: 5e-6
learning rate scheduler type: cosine
Warmup ratio: 0.1
cutoff len (i.e. context length): 8192
orpo beta (i.e. $\lambda$ in the ORPO paper): 0.05
global batch size: 64
fine-tuning type: full parameters
optimizer: paged_adamw_32bit
To reproduce the model

Paste, drop or click to upload images (.png, .jpeg, .jpg, .svg, .gif)