alKoGolik's picture
Upload 169 files
c87c295 verified
# Test-suite Reduction
## Preperation Work
As test-suite reduction relies on the results of evaluation, make sure that you've run the evaluation script and an `eval_results.json` has been generated for each model under test.
Use the following command to install necessary dependencies:
```bash
# in $EVALPLUS_ROOT
pip install -r requirements-tsr.txt
```
## Usage
```bash
python3 run.py \
--dataset DATASET \
--sample_eval_dir SAMPLE_DIR \
--model MODEL \
[--report_dir REPORT_DIR]
# Example
python3 run.py --dataset humaneval --sample_eval_dir $HOME/HumanEval --model ALL
```
Parameter descriptions:
* `--dataset`: currently, `humaneval` and `mbpp` are supported.
* `--sample_eval_dir` is the directory containing all the LLM evaluation results. We require the directory be structured as
```bash
SAMPLE_EVAL_DIR
β”œβ”€β”€ LLM_1
β”‚ β”œβ”€β”€ ...
β”‚Β Β  └── eval_results.json
β”œβ”€β”€ LLM_2
β”‚ β”œβ”€β”€ ...
β”œβ”€β”€ ...
```
* `--report_dir` is the directory where we store intermediate files, pass@k results, and reduced dataset. If not specified, `REPORT_DIR=./tsr_info` by default.
* If `MODEL` is a specific LLM name, the cross-validation results will be generated in `REPORT_DIR`; if `MODEL == ALL`, a reduced dataset will be generated in `REPORT_DIR`.
## Known Issues
If you find the program stuck at the mutant generation step, try removing the line
```python
assert len(completion_id) == len(problems), "Missing problems in samples"
```
in `evalplus/evaluate.py`.