Spaces:
Runtime error
Runtime error
A newer version of the Gradio SDK is available:
5.23.3
Test-suite Reduction
Preperation Work
As test-suite reduction relies on the results of evaluation, make sure that you've run the evaluation script and an eval_results.json
has been generated for each model under test.
Use the following command to install necessary dependencies:
# in $EVALPLUS_ROOT
pip install -r requirements-tsr.txt
Usage
python3 run.py \
--dataset DATASET \
--sample_eval_dir SAMPLE_DIR \
--model MODEL \
[--report_dir REPORT_DIR]
# Example
python3 run.py --dataset humaneval --sample_eval_dir $HOME/HumanEval --model ALL
Parameter descriptions:
--dataset
: currently,humaneval
andmbpp
are supported.--sample_eval_dir
is the directory containing all the LLM evaluation results. We require the directory be structured asSAMPLE_EVAL_DIR βββ LLM_1 β βββ ... β βββ eval_results.json βββ LLM_2 β βββ ... βββ ...
--report_dir
is the directory where we store intermediate files, pass@k results, and reduced dataset. If not specified,REPORT_DIR=./tsr_info
by default.- If
MODEL
is a specific LLM name, the cross-validation results will be generated inREPORT_DIR
; ifMODEL == ALL
, a reduced dataset will be generated inREPORT_DIR
.
Known Issues
If you find the program stuck at the mutant generation step, try removing the line
assert len(completion_id) == len(problems), "Missing problems in samples"
in evalplus/evaluate.py
.