xeon27
commited on
Commit
·
5652cd0
1
Parent(s):
363cbd2
Update reproducibility text
Browse files- src/about.py +24 -22
src/about.py
CHANGED
@@ -113,32 +113,34 @@ These benchmarks go beyond basic reasoning and evaluate more advanced, autonomou
|
|
113 |
"""
|
114 |
|
115 |
REPRODUCIBILITY_TEXT = """
|
116 |
-
##
|
117 |
-
|
118 |
-
|
119 |
-
|
120 |
-
from transformers import AutoConfig, AutoModel, AutoTokenizer
|
121 |
-
config = AutoConfig.from_pretrained("your model name", revision=revision)
|
122 |
-
model = AutoModel.from_pretrained("your model name", revision=revision)
|
123 |
-
tokenizer = AutoTokenizer.from_pretrained("your model name", revision=revision)
|
124 |
-
```
|
125 |
-
If this step fails, follow the error messages to debug your model before submitting it. It's likely your model has been improperly uploaded.
|
126 |
|
127 |
-
|
128 |
-
|
|
|
|
|
|
|
129 |
|
130 |
-
|
131 |
-
|
|
|
|
|
132 |
|
133 |
-
|
134 |
-
This is a leaderboard for Open LLMs, and we'd love for as many people as possible to know they can use your model 🤗
|
135 |
|
136 |
-
|
137 |
-
|
|
|
|
|
138 |
|
139 |
-
|
140 |
-
|
141 |
-
|
142 |
-
|
|
|
|
|
143 |
"""
|
144 |
|
|
|
113 |
"""
|
114 |
|
115 |
REPRODUCIBILITY_TEXT = """
|
116 |
+
## 🛠️ Reproducibility
|
117 |
+
The [Vector State of Evaluation Leaderboard Repository](https://github.com/VectorInstitute/evaluation) repository contains the evaluation script to reproduce results presented on the leaderboard.
|
118 |
+
|
119 |
+
### Install dependencies
|
|
|
|
|
|
|
|
|
|
|
|
|
120 |
|
121 |
+
1. Create a python virtual env. with ```python>=3.10``` and activate it
|
122 |
+
```bash
|
123 |
+
python -m venv env
|
124 |
+
source env/bin/activate
|
125 |
+
```
|
126 |
|
127 |
+
2. Install ```inspect_ai```, ```inspect_evals``` and other dependencies based on ```requirements.txt```
|
128 |
+
```bash
|
129 |
+
python -m pip install -r requirements.txt
|
130 |
+
```
|
131 |
|
132 |
+
3. Install any packages required for models you'd like to evaluate and use as grader models
|
|
|
133 |
|
134 |
+
Note: ```openai``` package is already included in ```requirements.txt```
|
135 |
+
```bash
|
136 |
+
python -m pip install <model_package>
|
137 |
+
```
|
138 |
|
139 |
+
### Run Inspect evaluation
|
140 |
+
1. Update the ```src/evals_cfg/run_cfg.yaml``` file to select the evals (base/agentic) and include all models to be evaluated
|
141 |
+
2. Now run evaluation as follows:
|
142 |
+
```bash
|
143 |
+
python src/run_evals.py
|
144 |
+
```
|
145 |
"""
|
146 |
|