Update README.md
Browse files
README.md
CHANGED
@@ -41,7 +41,7 @@ We use state-of-the-art [Language Model Evaluation Harness](https://github.com/E
|
|
41 |
|
42 |
|
43 |
## Reproducing Evaluation Results
|
44 |
-
Install LM Evaluation Harness
|
45 |
```
|
46 |
git clone https://github.com/EleutherAI/lm-evaluation-harness
|
47 |
cd lm-evaluation-harness
|
@@ -49,22 +49,22 @@ pip install -e .
|
|
49 |
```
|
50 |
Each task was evaluated on a single A100 80GB GPU.
|
51 |
|
52 |
-
ARC
|
53 |
```
|
54 |
python main.py --model hf-causal-experimental --model_args pretrained=lilloukas/GPlatty-30B --tasks arc_challenge --batch_size 1 --no_cache --write_out --output_path results/Platypus-30B/arc_challenge_25shot.json --device cuda --num_fewshot 25
|
55 |
```
|
56 |
|
57 |
-
HellaSwag
|
58 |
```
|
59 |
python main.py --model hf-causal-experimental --model_args pretrained=lilloukas/GPlatty-30B --tasks hellaswag --batch_size 1 --no_cache --write_out --output_path results/Platypus-30B/hellaswag_10shot.json --device cuda --num_fewshot 10
|
60 |
```
|
61 |
|
62 |
-
MMLU
|
63 |
```
|
64 |
python main.py --model hf-causal-experimental --model_args pretrained=lilloukas/GPlatty-30B --tasks hendrycksTest-* --batch_size 1 --no_cache --write_out --output_path results/Platypus-30B/mmlu_5shot.json --device cuda --num_fewshot 5
|
65 |
```
|
66 |
|
67 |
-
TruthfulQA
|
68 |
```
|
69 |
python main.py --model hf-causal-experimental --model_args pretrained=lilloukas/GPlatty-30B --tasks truthfulqa_mc --batch_size 1 --no_cache --write_out --output_path results/Platypus-30B/truthfulqa_0shot.json --device cuda
|
70 |
```
|
|
|
41 |
|
42 |
|
43 |
## Reproducing Evaluation Results
|
44 |
+
Install LM Evaluation Harness:
|
45 |
```
|
46 |
git clone https://github.com/EleutherAI/lm-evaluation-harness
|
47 |
cd lm-evaluation-harness
|
|
|
49 |
```
|
50 |
Each task was evaluated on a single A100 80GB GPU.
|
51 |
|
52 |
+
ARC:
|
53 |
```
|
54 |
python main.py --model hf-causal-experimental --model_args pretrained=lilloukas/GPlatty-30B --tasks arc_challenge --batch_size 1 --no_cache --write_out --output_path results/Platypus-30B/arc_challenge_25shot.json --device cuda --num_fewshot 25
|
55 |
```
|
56 |
|
57 |
+
HellaSwag:
|
58 |
```
|
59 |
python main.py --model hf-causal-experimental --model_args pretrained=lilloukas/GPlatty-30B --tasks hellaswag --batch_size 1 --no_cache --write_out --output_path results/Platypus-30B/hellaswag_10shot.json --device cuda --num_fewshot 10
|
60 |
```
|
61 |
|
62 |
+
MMLU:
|
63 |
```
|
64 |
python main.py --model hf-causal-experimental --model_args pretrained=lilloukas/GPlatty-30B --tasks hendrycksTest-* --batch_size 1 --no_cache --write_out --output_path results/Platypus-30B/mmlu_5shot.json --device cuda --num_fewshot 5
|
65 |
```
|
66 |
|
67 |
+
TruthfulQA:
|
68 |
```
|
69 |
python main.py --model hf-causal-experimental --model_args pretrained=lilloukas/GPlatty-30B --tasks truthfulqa_mc --batch_size 1 --no_cache --write_out --output_path results/Platypus-30B/truthfulqa_0shot.json --device cuda
|
70 |
```
|