xeon27 commited on
Commit
5652cd0
·
1 Parent(s): 363cbd2

Update reproducibility text

Browse files
Files changed (1) hide show
  1. src/about.py +24 -22
src/about.py CHANGED
@@ -113,32 +113,34 @@ These benchmarks go beyond basic reasoning and evaluate more advanced, autonomou
113
  """
114
 
115
  REPRODUCIBILITY_TEXT = """
116
- ## Reproduce and Extend the Leaderboard
117
-
118
- ### 1) Make sure you can load your model and tokenizer using AutoClasses:
119
- ```python
120
- from transformers import AutoConfig, AutoModel, AutoTokenizer
121
- config = AutoConfig.from_pretrained("your model name", revision=revision)
122
- model = AutoModel.from_pretrained("your model name", revision=revision)
123
- tokenizer = AutoTokenizer.from_pretrained("your model name", revision=revision)
124
- ```
125
- If this step fails, follow the error messages to debug your model before submitting it. It's likely your model has been improperly uploaded.
126
 
127
- Note: make sure your model is public!
128
- Note: if your model needs `use_remote_code=True`, we do not support this option yet but we are working on adding it, stay posted!
 
 
 
129
 
130
- ### 2) Convert your model weights to [safetensors](https://huggingface.co/docs/safetensors/index)
131
- It's a new format for storing weights which is safer and faster to load and use. It will also allow us to add the number of parameters of your model to the `Extended Viewer`!
 
 
132
 
133
- ### 3) Make sure your model has an open license!
134
- This is a leaderboard for Open LLMs, and we'd love for as many people as possible to know they can use your model 🤗
135
 
136
- ### 4) Fill up your model card
137
- When we add extra information about models to the leaderboard, it will be automatically taken from the model card
 
 
138
 
139
- ## In case of model failure
140
- If your model is displayed in the `FAILED` category, its execution stopped.
141
- Make sure you have followed the above steps first.
142
- If everything is done, check you can launch the EleutherAIHarness on your model locally, using the above command without modifications (you can add `--limit` to limit the number of examples per task).
 
 
143
  """
144
 
 
113
  """
114
 
115
  REPRODUCIBILITY_TEXT = """
116
+ ## 🛠️ Reproducibility
117
+ The [Vector State of Evaluation Leaderboard Repository](https://github.com/VectorInstitute/evaluation) repository contains the evaluation script to reproduce results presented on the leaderboard.
118
+
119
+ ### Install dependencies
 
 
 
 
 
 
120
 
121
+ 1. Create a python virtual env. with ```python>=3.10``` and activate it
122
+ ```bash
123
+ python -m venv env
124
+ source env/bin/activate
125
+ ```
126
 
127
+ 2. Install ```inspect_ai```, ```inspect_evals``` and other dependencies based on ```requirements.txt```
128
+ ```bash
129
+ python -m pip install -r requirements.txt
130
+ ```
131
 
132
+ 3. Install any packages required for models you'd like to evaluate and use as grader models
 
133
 
134
+ Note: ```openai``` package is already included in ```requirements.txt```
135
+ ```bash
136
+ python -m pip install <model_package>
137
+ ```
138
 
139
+ ### Run Inspect evaluation
140
+ 1. Update the ```src/evals_cfg/run_cfg.yaml``` file to select the evals (base/agentic) and include all models to be evaluated
141
+ 2. Now run evaluation as follows:
142
+ ```bash
143
+ python src/run_evals.py
144
+ ```
145
  """
146