SFT+DPO trained on dataset: https://huggingface.co/datasets/Fsoft-AIC/MainframeBench/
Training Config
- Training Set Size: 2,270
- Test Set Size: 253
- learning_rate_multiplier: 1
- num_epochs: 4
- batch_size: auto
- preference_tuning_variant: APO Zero
- preference_tuning_learning_rate_multiplier: 1
- preference_tuning_num_epochs: 1
- preference_tuning_training_beta: 0.1
- preference_tuning_adapter_weight: 0.5