Spaces:
Runtime error
Runtime error
File size: 1,539 Bytes
c87c295 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 |
# Test-suite Reduction
## Preperation Work
As test-suite reduction relies on the results of evaluation, make sure that you've run the evaluation script and an `eval_results.json` has been generated for each model under test.
Use the following command to install necessary dependencies:
```bash
# in $EVALPLUS_ROOT
pip install -r requirements-tsr.txt
```
## Usage
```bash
python3 run.py \
--dataset DATASET \
--sample_eval_dir SAMPLE_DIR \
--model MODEL \
[--report_dir REPORT_DIR]
# Example
python3 run.py --dataset humaneval --sample_eval_dir $HOME/HumanEval --model ALL
```
Parameter descriptions:
* `--dataset`: currently, `humaneval` and `mbpp` are supported.
* `--sample_eval_dir` is the directory containing all the LLM evaluation results. We require the directory be structured as
```bash
SAMPLE_EVAL_DIR
βββ LLM_1
β βββ ...
βΒ Β βββ eval_results.json
βββ LLM_2
β βββ ...
βββ ...
```
* `--report_dir` is the directory where we store intermediate files, pass@k results, and reduced dataset. If not specified, `REPORT_DIR=./tsr_info` by default.
* If `MODEL` is a specific LLM name, the cross-validation results will be generated in `REPORT_DIR`; if `MODEL == ALL`, a reduced dataset will be generated in `REPORT_DIR`.
## Known Issues
If you find the program stuck at the mutant generation step, try removing the line
```python
assert len(completion_id) == len(problems), "Missing problems in samples"
```
in `evalplus/evaluate.py`.
|