Spaces:
Running
Running
title: MLRC-BENCH | |
emoji: 📊 | |
colorFrom: green | |
colorTo: blue | |
sdk: streamlit | |
sdk_version: 1.39.0 | |
app_file: app.py | |
pinned: false | |
license: cc-by-4.0 | |
## Overview | |
This application provides a visual leaderboard for comparing AI model performance on challenging Machine Learning Research Competition problems. It uses Streamlit to create an interactive web interface with filtering options, allowing users to select specific models and tasks for comparison. | |
The leaderboard uses the MLRC-BENCH benchmark, which measures what percentage of the top human-to-baseline performance gap an agent can close. Success is defined as achieving at least 5% of the margin by which the top human solution surpasses the baseline. | |
## Installation & Setup | |
1. Clone the repository | |
```bash | |
git clone https://huggingface.co/spaces/launch/MLRC_Bench | |
cd MLRC_Bench | |
``` | |
2. Setup virtual env and install the required dependencies | |
```bash | |
python -m venv env | |
source env/bin/activate | |
pip install -r requirements.txt | |
``` | |
3. Run the application | |
```bash | |
streamlit run app.py | |
``` | |
### Updating Metrics | |
To update the table, update the respective metric file in `src/data/metrics` directory | |
### Updating Text | |
To update the tab on Benchmark details, make changes to the the following file - `src/components/tasks.py` | |
To update the metric definitions, make changes to the following file - `src/components/tasks.py` | |
### Adding New Metrics | |
To add a new metric: | |
1. Create a new JSON data file in the `src/data/metrics/` directory (e.g., `src/data/metrics/new_metric.json`) | |
2. Update `metrics_config` in `src/utils/config.py`: | |
```python | |
metrics_config = { | |
"Margin to Human": { ... }, | |
"New Metric Name": { | |
"file": "src/data/metrics/new_metric.json", | |
"description": "Description of the new metric", | |
"min_value": 0, | |
"max_value": 100, | |
"color_map": "viridis" | |
} | |
} | |
``` | |
3. Ensure your metric JSON file follows the same format as existing metrics: | |
```json | |
{ | |
"task-name": { | |
"model-name-1": value, | |
"model-name-2": value | |
}, | |
"another-task": { | |
"model-name-1": value, | |
"model-name-2": value | |
} | |
} | |
``` | |
### Adding New Agent Types | |
To add new agent types: | |
1. Update `model_categories` in `src/utils/config.py`: | |
```python | |
model_categories = { | |
"Existing Model": "Category", | |
"New Model Name": "New Category" | |
} | |
``` | |
## License | |
[MIT License](LICENSE) | |