104 Pulls Updated 5 months ago
Readme
Introduction
Nxcode-CQ-7B-orpo is an Monolithic Preference Optimization without Reference Model fine-tune of Qwen/CodeQwen1.5-7B on 100k samples of high-quality ranking data.
Evalplus
EvalPlus | pass@1 |
---|---|
HumanEval | 86.6 |
HumanEval+ | 83.5 |
MBPP(v0.2.0) | 82.3 |
MBPP+(v0.2.0) | 70.4 |
We use a simple template to generate the solution for evalplus:
"Complete the following Python function:\n{prompt}"
Models | HumanEval | HumanEval+ |
---|---|---|
GPT-4-Turbo (April 2024) | 90.2 | 86.6 |
GPT-4 (May 2023) | 88.4 | 81.17 |
GPT-4-Turbo (Nov 2023) | 85.4 | 79.3 |
CodeQwen1.5-7B-Chat | 83.5 | 78.7 |
claude-3-opus (Mar 2024) | 82.9 | 76.8 |
DeepSeek-Coder-33B-instruct | 81.1 | 75.0 |
WizardCoder-33B-V1.1 | 79.9 | 73.2 |
OpenCodeInterpreter-DS-33B | 79.3 | 73.8 |
speechless-codellama-34B-v2.0 | 77.4 | 72 |
GPT-3.5-Turbo (Nov 2023) | 76.8 | 70.7 |
Llama3-70B-instruct | 76.2 | 70.7 |