Spaces:
Runtime error
Runtime error
# Test-suite Reduction | |
## Preperation Work | |
As test-suite reduction relies on the results of evaluation, make sure that you've run the evaluation script and an `eval_results.json` has been generated for each model under test. | |
Use the following command to install necessary dependencies: | |
```bash | |
# in $EVALPLUS_ROOT | |
pip install -r requirements-tsr.txt | |
``` | |
## Usage | |
```bash | |
python3 run.py \ | |
--dataset DATASET \ | |
--sample_eval_dir SAMPLE_DIR \ | |
--model MODEL \ | |
[--report_dir REPORT_DIR] | |
# Example | |
python3 run.py --dataset humaneval --sample_eval_dir $HOME/HumanEval --model ALL | |
``` | |
Parameter descriptions: | |
* `--dataset`: currently, `humaneval` and `mbpp` are supported. | |
* `--sample_eval_dir` is the directory containing all the LLM evaluation results. We require the directory be structured as | |
```bash | |
SAMPLE_EVAL_DIR | |
βββ LLM_1 | |
β βββ ... | |
βΒ Β βββ eval_results.json | |
βββ LLM_2 | |
β βββ ... | |
βββ ... | |
``` | |
* `--report_dir` is the directory where we store intermediate files, pass@k results, and reduced dataset. If not specified, `REPORT_DIR=./tsr_info` by default. | |
* If `MODEL` is a specific LLM name, the cross-validation results will be generated in `REPORT_DIR`; if `MODEL == ALL`, a reduced dataset will be generated in `REPORT_DIR`. | |
## Known Issues | |
If you find the program stuck at the mutant generation step, try removing the line | |
```python | |
assert len(completion_id) == len(problems), "Missing problems in samples" | |
``` | |
in `evalplus/evaluate.py`. | |