TomPei commited on
Commit
99b3689
·
verified ·
1 Parent(s): ac78348

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +9 -2
README.md CHANGED
@@ -47,8 +47,11 @@ This is the repository for the base 7B version finetuned based on [CodeLlama-7b-
47
  ## Model Eval
48
 
49
  HumanEval is the commonest code generation benchmark to evaluate the performance of models, especially on the the compeltion of code exercise cases.
50
- Somehow, model evaluation is a kind of metaphysics. Different models are sensitive to different decoding methods and paramters.
51
- It is impratical for us to manually set specific configuration for each fine-tuned model, because a real LLM should master the universal capability despite the parameters manipulated by users
 
 
 
52
 
53
  | Model | HumanEval python pass@1 |
54
  | --- |----------------------------------------------------------------------------- |
@@ -59,6 +62,10 @@ It is impratical for us to manually set specific configuration for each fine-tun
59
  | CodeLlama-34b-hf | 48.2%|
60
  | opencsg-CodeLlama-34b-v0.1(4k)| **48.8%** |
61
 
 
 
 
 
62
  ```python
63
  from transformers import AutoTokenizer
64
  import transformers
 
47
  ## Model Eval
48
 
49
  HumanEval is the commonest code generation benchmark to evaluate the performance of models, especially on the the compeltion of code exercise cases.
50
+ Somehow, model evaluation is a kind of metaphysics. Different models are sensitive to different decoding methods, paramters and instructions.
51
+ It is impratical for us to manually set specific configuration for each fine-tuned model, because a real LLM should master the universal capability despite the parameters manipulated by users.
52
+
53
+ Thus, OpenCSG strained our brains to provide a relatively fair method to compare the fine-tuned models on HumanEval benchmark.
54
+ To simplify the comparision, we chosed the Pass@1 metric on python language, but our finetuning dataset includes multi language samples.
55
 
56
  | Model | HumanEval python pass@1 |
57
  | --- |----------------------------------------------------------------------------- |
 
62
  | CodeLlama-34b-hf | 48.2%|
63
  | opencsg-CodeLlama-34b-v0.1(4k)| **48.8%** |
64
 
65
+ **TODO**
66
+ - we will provide much more benchmark scores on fine-tuned models in future.
67
+ - we will provide different practical problems to evaluate the performance of fine-tuned models in the field of software engineering.
68
+
69
  ```python
70
  from transformers import AutoTokenizer
71
  import transformers