Update README.md
Browse files
README.md
CHANGED
|
@@ -67,6 +67,12 @@ To simplify the comparison, we chosed the Pass@1 metric for the Python language,
|
|
| 67 |
| CodeLlama-34b-hf | 48.2%|
|
| 68 |
| opencsg-CodeLlama-34b-v0.1| **56.1%** |
|
| 69 |
| opencsg-CodeLlama-34b-v0.2| **64.0%** |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 70 |
|
| 71 |
**TODO**
|
| 72 |
- We will provide more benchmark scores on fine-tuned models in the future.
|
|
@@ -180,6 +186,8 @@ HumanEval 是评估模型在代码生成方面性能的最常见的基准,尤
|
|
| 180 |
| CodeLlama-34b-hf | 48.2%|
|
| 181 |
| opencsg-CodeLlama-34b-v0.1| **56.1%** |
|
| 182 |
| opencsg-CodeLlama-34b-v0.2| **64.0%** |
|
|
|
|
|
|
|
| 183 |
|
| 184 |
**TODO**
|
| 185 |
- 未来我们将提供更多微调模型的在各基准上的分数。
|
|
|
|
| 67 |
| CodeLlama-34b-hf | 48.2%|
|
| 68 |
| opencsg-CodeLlama-34b-v0.1| **56.1%** |
|
| 69 |
| opencsg-CodeLlama-34b-v0.2| **64.0%** |
|
| 70 |
+
| CodeLlama-70b-hf| 53.0% |
|
| 71 |
+
| CodeLlama-70b-Instruct-hf| **67.8%** |
|
| 72 |
+
|
| 73 |
+
|
| 74 |
+
|
| 75 |
+
|
| 76 |
|
| 77 |
**TODO**
|
| 78 |
- We will provide more benchmark scores on fine-tuned models in the future.
|
|
|
|
| 186 |
| CodeLlama-34b-hf | 48.2%|
|
| 187 |
| opencsg-CodeLlama-34b-v0.1| **56.1%** |
|
| 188 |
| opencsg-CodeLlama-34b-v0.2| **64.0%** |
|
| 189 |
+
| CodeLlama-70b-hf| 53.0% |
|
| 190 |
+
| CodeLlama-70b-Instruct-hf| **67.8%** |
|
| 191 |
|
| 192 |
**TODO**
|
| 193 |
- 未来我们将提供更多微调模型的在各基准上的分数。
|