Eevaluation of the Qwen-Arabic language model (1.5B parameters) on the ArabicMMLU benchmark. The model demonstrates strong parameter efficiency while maintaining competitive performance across various knowledge domains.

Model Overview

Qwen-Arabic is a 1.5B parameter language model fine-tuned for Arabic language tasks. It is based on the Qwen architecture and optimized using QLoRA (Quantized Low-Rank Adaptation) techniques.

Performance Results

Overall Performance

Average Accuracy: 42.3%
Best Category: Social Science (46.1%)
Most Challenging: Arabic Language (37.8%)

Category-wise Performance

Category	Accuracy (%)
STEM	42.2
Social Science	46.1
Humanities	41.8
Arabic Language	37.8
Other	42.9
Average	42.3

Efficiency Analysis

Performance per Billion Parameters: 28.20 accuracy points
389.0x more parameter-efficient than GPT-4
Achieves 58.3% of GPT-4’s performance with only 0.15% of parameters

Comparison with Other Models

Model	Parameters	Average Accuracy	Efficiency Score
GPT-4	~1000B	72.5%	0.072
Jais-chat	30B	62.3%	2.077
AceGPT-chat	13B	52.6%	4.046
Qwen-Arabic	1.5B	42.3%	28.200

Prerequisites

Ubuntu (or similar Linux distribution)
Python 3.10
CUDA-compatible GPU with at least 4GB VRAM
At least 12GB system RAM
Ollama installed and configured

Setup

Clone this repository:

git clone https://github.com/prakash-aryan/qwen-arabic-project.git
cd qwen-arabic-project

Create and activate a virtual environment:

python3.10 -m venv qwen_env
source qwen_env/bin/activate

Install the required packages:

pip install --upgrade pip
pip install -r requirements.txt

Install PyTorch with CUDA support:

pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118

Project Structure

qwen-arabic-project/
├── data/
│   └── arabic_instruction_dataset/
├── models/
├── results/
├── src/
│   ├── compare_qwen_models.py
│   ├── evaluate_arabic_model.py
│   ├── finetune_qwen.py
│   ├── get_datasets.py
│   ├── load_and_merge_model.py
│   ├── preprocess_datasets.py
│   └── validate_dataset.py
├── tools/
│   └── llama-quantize
├── requirements.txt
├── run_pipeline.sh
├── Modelfile
└── README.md

Usage

Download and prepare datasets:
```
python src/get_datasets.py
```
Preprocess and combine datasets:
```
python src/preprocess_datasets.py
```
Validate the dataset:
```
python src/validate_dataset.py
```

Fine-tune the model:

python src/finetune_qwen.py --data_path ./data/arabic_instruction_dataset --output_dir ./models/qwen2_arabic_finetuned --num_epochs 3 --batch_size 1 --gradient_accumulation_steps 16 --learning_rate 2e-5

Load and merge the fine-tuned model:
```
python src/load_and_merge_model.py
```

Convert to GGUF format:

python src/convert_hf_to_gguf.py ./models/qwen2_arabic_merged_full --outfile ./models/qwen_arabic_merged_full.gguf

Quantize the model:

./tools/llama-quantize ./models/qwen_arabic_merged_full.gguf ./models/qwen_arabic_merged_full_q4_k_m.gguf q4_k_m

Create Ollama model:

ollama create qwen-arabic-custom -f Modelfile

Evaluate the model:
```
python src/evaluate_arabic_model.py
```
Compare models:
```
python src/compare_qwen_models.py
```

Running the Full Pipeline

To run the entire pipeline from data preparation to model evaluation, use the provided shell script:

chmod +x run_pipeline.sh
./run_pipeline.sh

Notes

Ensure you have sufficient disk space for the datasets and model files.
The fine-tuning process can take several hours to days, depending on your hardware.
Monitor GPU memory usage during fine-tuning and adjust batch size or gradient accumulation steps if necessary.
Make sure to have Ollama installed for the model creation and evaluation steps.

Troubleshooting

If you encounter CUDA out-of-memory errors, try reducing the batch size or increasing gradient accumulation steps.
For any other issues, please check the error logs or open an issue in the GitHub repository.

License

This project is licensed under the GNU General Public License v3.0 (GPL-3.0).

This means: - You can use, modify, and distribute this software. - If you distribute modified versions, you must also distribute them under the GPL-3.0. - You must include the original copyright notice and the license text. - You must disclose your source code when you distribute the software. - There’s no warranty for this free software.

For more details, see the LICENSE file in this repository or visit GNU GPL v3.0.

Acknowledgements

This project uses the following main libraries and tools: - Transformers by Hugging Face - PyTorch - PEFT (Parameter-Efficient Fine-Tuning) - Ollama - GGUF (for model conversion)

# Qwen Arabic Fine-tuning Project

This project fine-tunes the Qwen2-1.5B model for Arabic language tasks using Quantized LoRA (QLoRA).

# Qwen-Arabic Evaluation on ArabicMMLU

## Model Overview

Qwen-Arabic is a 1.5B parameter language model fine-tuned for Arabic language tasks. It is based on the Qwen architecture and optimized using QLoRA (Quantized Low-Rank Adaptation) techniques.

## Performance Results

### Overall Performance
- Average Accuracy: 42.3%
- Best Category: Social Science (46.1%)
- Most Challenging: Arabic Language (37.8%)

### Category-wise Performance
| Category         | Accuracy (%) |
|-----------------|--------------|
| STEM            | 42.2         |
| Social Science  | 46.1         |
| Humanities      | 41.8         |
| Arabic Language | 37.8         |
| Other           | 42.9         |
| Average         | 42.3         |

### Efficiency Analysis
- Performance per Billion Parameters: 28.20 accuracy points
- 389.0x more parameter-efficient than GPT-4
- Achieves 58.3% of GPT-4's performance with only 0.15% of parameters

### Comparison with Other Models
| Model              | Parameters | Average Accuracy | Efficiency Score |
|-------------------|------------|------------------|------------------|
| GPT-4             | ~1000B     | 72.5%           | 0.072           |
| Jais-chat         | 30B        | 62.3%           | 2.077           |
| AceGPT-chat       | 13B        | 52.6%           | 4.046           |
| Qwen-Arabic       | 1.5B       | 42.3%           | 28.200          |

## Prerequisites

- Ubuntu (or similar Linux distribution)
- Python 3.10
- CUDA-compatible GPU with at least 4GB VRAM
- At least 12GB system RAM
- Ollama installed and configured

## Setup

1. Clone this repository:
   ```
   git clone https://github.com/prakash-aryan/qwen-arabic-project.git
   cd qwen-arabic-project
   ```

2. Create and activate a virtual environment:
   ```
   python3.10 -m venv qwen_env
   source qwen_env/bin/activate
   ```

3. Install the required packages:
   ```
   pip install --upgrade pip
   pip install -r requirements.txt
   ```

4. Install PyTorch with CUDA support:
   ```
   pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
   ```

## Project Structure

```
qwen-arabic-project/
├── data/
│   └── arabic_instruction_dataset/
├── models/
├── results/
├── src/
│   ├── compare_qwen_models.py
│   ├── evaluate_arabic_model.py
│   ├── finetune_qwen.py
│   ├── get_datasets.py
│   ├── load_and_merge_model.py
│   ├── preprocess_datasets.py
│   └── validate_dataset.py
├── tools/
│   └── llama-quantize
├── requirements.txt
├── run_pipeline.sh
├── Modelfile
└── README.md
```

## Usage

1. Download and prepare datasets:
   ```
   python src/get_datasets.py
   ```

2. Preprocess and combine datasets:
   ```
   python src/preprocess_datasets.py
   ```

3. Validate the dataset:
   ```
   python src/validate_dataset.py
   ```

4. Fine-tune the model:
   ```
   python src/finetune_qwen.py --data_path ./data/arabic_instruction_dataset --output_dir ./models/qwen2_arabic_finetuned --num_epochs 3 --batch_size 1 --gradient_accumulation_steps 16 --learning_rate 2e-5
   ```

5. Load and merge the fine-tuned model:
   ```
   python src/load_and_merge_model.py
   ```

6. Convert to GGUF format:
   ```
   python src/convert_hf_to_gguf.py ./models/qwen2_arabic_merged_full --outfile ./models/qwen_arabic_merged_full.gguf
   ```

7. Quantize the model:
   ```
   ./tools/llama-quantize ./models/qwen_arabic_merged_full.gguf ./models/qwen_arabic_merged_full_q4_k_m.gguf q4_k_m
   ```

8. Create Ollama model:
   ```
   ollama create qwen-arabic-custom -f Modelfile
   ```

9. Evaluate the model:
   ```
   python src/evaluate_arabic_model.py
   ```

10. Compare models:
    ```
    python src/compare_qwen_models.py
    ```

## Running the Full Pipeline

To run the entire pipeline from data preparation to model evaluation, use the provided shell script:

```
chmod +x run_pipeline.sh
./run_pipeline.sh
```

## Notes

- Ensure you have sufficient disk space for the datasets and model files.
- The fine-tuning process can take several hours to days, depending on your hardware.
- Monitor GPU memory usage during fine-tuning and adjust batch size or gradient accumulation steps if necessary.
- Make sure to have Ollama installed for the model creation and evaluation steps.

## Troubleshooting

- If you encounter CUDA out-of-memory errors, try reducing the batch size or increasing gradient accumulation steps.
- For any other issues, please check the error logs or open an issue in the GitHub repository.

## License

This project is licensed under the GNU General Public License v3.0 (GPL-3.0).

This means:
- You can use, modify, and distribute this software.
- If you distribute modified versions, you must also distribute them under the GPL-3.0.
- You must include the original copyright notice and the license text.
- You must disclose your source code when you distribute the software.
- There's no warranty for this free software.

For more details, see the [LICENSE](LICENSE) file in this repository or visit [GNU GPL v3.0](https://www.gnu.org/licenses/gpl-3.0.en.html).

## Acknowledgements

This project uses the following main libraries and tools:
- Transformers by Hugging Face
- PyTorch
- PEFT (Parameter-Efficient Fine-Tuning)
- Ollama
- GGUF (for model conversion)

Paste, drop or click to upload images (.png, .jpeg, .jpg, .svg, .gif)