Update README.md
Browse files
README.md
CHANGED
@@ -33,7 +33,7 @@ We introduce AceCoder, the first work to propose a fully automated pipeline for
|
|
33 |
|
34 |
| Model | Code | Chat | Math | Safety | Easy | Normal | Hard | Avg |
|
35 |
| ------------------------------------ | ---- | ----- | ----- | ------ | ----- | ------ | ---- | ---- |
|
36 |
-
| Skywork/Skywork-Reward-Llama-3.1-8B | 54.5 | 69.5 | 60.6 | 95.7 | 89 | 74.7 | 46.6 | 70.1 |
|
37 |
| LxzGordon/URM-LLaMa-3.1-8B | 54.1 | 71.2 | 61.8 | 93.1 | 84 | 73.2 | 53 | 70 |
|
38 |
| NVIDIA/Nemotron-340B-Reward | 59.4 | 71.2 | 59.8 | 87.5 | 81 | 71.4 | 56.1 | 69.5 |
|
39 |
| NCSOFT/Llama-3-OffsetBias-RM-8B | 53.2 | 71.3 | 61.9 | 89.6 | 84.6 | 72.2 | 50.2 | 69 |
|
@@ -42,14 +42,16 @@ We introduce AceCoder, the first work to propose a fully automated pipeline for
|
|
42 |
| Ray2333/GRM-llama3-8B-distill | 56.9 | 62.4 | 62.1 | 88.1 | 82.2 | 71.5 | 48.4 | 67.4 |
|
43 |
| Ray2333/GRM-Llama3-8B-rewardmodel-ft | 52.1 | 66.8 | 58.8 | 91.4 | 86.2 | 70.6 | 45.1 | 67.3 |
|
44 |
| LxzGordon/URM-LLLaMa-3-8B | 52.3 | 68.5 | 57.6 | 90.3 | 80.2 | 69.9 | 51.5 | 67.2 |
|
45 |
-
| internlm/internlm2-7b-reward | 49.7 | 61.7 | 71.4 | 85.5 | 85.4 | 70.7 | 45.1 | 67.1 |
|
46 |
-
| Skywork-Reward-Llama-3.1-8B-v0.2 | 53.4 | 69.2 | 62.1 | 96 | 88.5 | 74 | 47.9 | 70.1 |
|
47 |
-
| Skywork-Reward-Gemma-2-27B-v0.2 | 45.8 | 49.4 | 50.7 | 48.2 | 50.3 | 48.2 | 47 | 48.5 |
|
48 |
| AceCoder-RM-7B | 66.9 | 66.7 | 65.3 | 89.9 | 79.9 | 74.4 | 62.2 | 72.2 |
|
49 |
-
| AceCoder-RM-32B | 72.1 | 73.7 | 70.5 | 88 | 84.5 | 78.3 | 65.5 | 76.1 |
|
50 |
| Delta (AceCoder 7B - Others) | 7.5 | \-4.6 | \-6.1 | \-6.1 | \-9.1 | \-0.3 | 6.1 | 2.1 |
|
51 |
| Delta (AceCoder 32B - Others) | 12.7 | 2.4 | \-0.9 | \-8 | \-4.5 | 3.6 | 9.4 | 6 |
|
52 |
|
|
|
|
|
53 |
|
54 |
## Performance on Best-of-N sampling
|
55 |
|
|
|
33 |
|
34 |
| Model | Code | Chat | Math | Safety | Easy | Normal | Hard | Avg |
|
35 |
| ------------------------------------ | ---- | ----- | ----- | ------ | ----- | ------ | ---- | ---- |
|
36 |
+
| Skywork/Skywork-Reward-Llama-3.1-8B | 54.5 | 69.5 | 60.6 | 95.7 | **89** | 74.7 | 46.6 | 70.1 |
|
37 |
| LxzGordon/URM-LLaMa-3.1-8B | 54.1 | 71.2 | 61.8 | 93.1 | 84 | 73.2 | 53 | 70 |
|
38 |
| NVIDIA/Nemotron-340B-Reward | 59.4 | 71.2 | 59.8 | 87.5 | 81 | 71.4 | 56.1 | 69.5 |
|
39 |
| NCSOFT/Llama-3-OffsetBias-RM-8B | 53.2 | 71.3 | 61.9 | 89.6 | 84.6 | 72.2 | 50.2 | 69 |
|
|
|
42 |
| Ray2333/GRM-llama3-8B-distill | 56.9 | 62.4 | 62.1 | 88.1 | 82.2 | 71.5 | 48.4 | 67.4 |
|
43 |
| Ray2333/GRM-Llama3-8B-rewardmodel-ft | 52.1 | 66.8 | 58.8 | 91.4 | 86.2 | 70.6 | 45.1 | 67.3 |
|
44 |
| LxzGordon/URM-LLLaMa-3-8B | 52.3 | 68.5 | 57.6 | 90.3 | 80.2 | 69.9 | 51.5 | 67.2 |
|
45 |
+
| internlm/internlm2-7b-reward* | 49.7 | 61.7 | **71.4** | 85.5 | 85.4 | 70.7 | 45.1 | 67.1 |
|
46 |
+
| Skywork-Reward-Llama-3.1-8B-v0.2* | 53.4 | 69.2 | 62.1 | **96** | 88.5 | 74 | 47.9 | 70.1 |
|
47 |
+
| Skywork-Reward-Gemma-2-27B-v0.2* | 45.8 | 49.4 | 50.7 | 48.2 | 50.3 | 48.2 | 47 | 48.5 |
|
48 |
| AceCoder-RM-7B | 66.9 | 66.7 | 65.3 | 89.9 | 79.9 | 74.4 | 62.2 | 72.2 |
|
49 |
+
| AceCoder-RM-32B | **72.1** | **73.7** | 70.5 | 88 | 84.5 | **78.3** | **65.5** | **76.1** |
|
50 |
| Delta (AceCoder 7B - Others) | 7.5 | \-4.6 | \-6.1 | \-6.1 | \-9.1 | \-0.3 | 6.1 | 2.1 |
|
51 |
| Delta (AceCoder 32B - Others) | 12.7 | 2.4 | \-0.9 | \-8 | \-4.5 | 3.6 | 9.4 | 6 |
|
52 |
|
53 |
+
* These models do not have official results as they are released later than the RM Bench paper; therefore, the authors tried our best to extend the original code base to test these models. Our implementation can be found here:
|
54 |
+
[Modified Reward Bench / RM Bench Code](https://github.com/wyettzeng/reward-bench)
|
55 |
|
56 |
## Performance on Best-of-N sampling
|
57 |
|