Update README.md
Browse files
README.md
CHANGED
@@ -46,7 +46,9 @@ This is the repository for the base 7B version finetuned based on [CodeLlama-7b-
|
|
46 |
|
47 |
## Model Eval
|
48 |
|
49 |
-
HumanEval is the
|
|
|
|
|
50 |
|
51 |
| Model | HumanEval python pass@1 |
|
52 |
| --- |----------------------------------------------------------------------------- |
|
|
|
46 |
|
47 |
## Model Eval
|
48 |
|
49 |
+
HumanEval is the commonest code generation benchmark to evaluate the performance of models, especially on the the compeltion of code exercise cases.
|
50 |
+
Somehow, model evaluation is a kind of metaphysics. Different models are sensitive to different decoding methods and paramters.
|
51 |
+
It is impratical for us to manually set specific configuration for each fine-tuned model, because a real LLM should master the universal capability despite the parameters manipulated by users
|
52 |
|
53 |
| Model | HumanEval python pass@1 |
|
54 |
| --- |----------------------------------------------------------------------------- |
|