opencsg
/

opencsg-CodeLlama-7b-v0.1

Text Generation

text-generation-inference

Model card Files Files and versions Community

TomPei commited on Jan 20, 2024

Commit

99b3689

·

verified ·

1 Parent(s): ac78348

Update README.md

Files changed (1) hide show

README.md +9 -2

README.md CHANGED Viewed

@@ -47,8 +47,11 @@ This is the repository for the base 7B version finetuned based on [CodeLlama-7b-
 ## Model Eval
 HumanEval is the commonest code generation benchmark to evaluate the performance of models, especially on the the compeltion of code exercise cases.
-Somehow, model evaluation is a kind of metaphysics. Different models are sensitive to different decoding methods and paramters.
-It is impratical for us to manually set specific configuration for each fine-tuned model, because a real LLM should master the universal capability despite the parameters manipulated by users
 | Model     | HumanEval python pass@1                                                 |
 | ---  |----------------------------------------------------------------------------- |
@@ -59,6 +62,10 @@ It is impratical for us to manually set specific configuration for each fine-tun
 | CodeLlama-34b-hf  |  48.2%|
 | opencsg-CodeLlama-34b-v0.1(4k)| **48.8%** |
 ```python
 from transformers import AutoTokenizer
 import transformers

 ## Model Eval
 HumanEval is the commonest code generation benchmark to evaluate the performance of models, especially on the the compeltion of code exercise cases.
+Somehow, model evaluation is a kind of metaphysics. Different models are sensitive to different decoding methods, paramters and instructions.
+It is impratical for us to manually set specific configuration for each fine-tuned model, because a real LLM should master the universal capability despite the parameters manipulated by users.
+Thus, OpenCSG strained our brains to provide a relatively fair method to compare the fine-tuned models on HumanEval benchmark.
+To simplify the comparision, we chosed the Pass@1 metric on python language, but our finetuning dataset includes multi language samples.
 | Model     | HumanEval python pass@1                                                 |
 | ---  |----------------------------------------------------------------------------- |
 | CodeLlama-34b-hf  |  48.2%|
 | opencsg-CodeLlama-34b-v0.1(4k)| **48.8%** |
+**TODO**
+- we will provide much more benchmark scores on fine-tuned models in future.
+- we will provide different practical problems to evaluate the performance of fine-tuned models in the field of software engineering.
 ```python
 from transformers import AutoTokenizer
 import transformers