File size: 2,511 Bytes
2926cc4
697ae1d
ed2eb44
 
 
2926cc4
 
 
 
 
 
 
ed2eb44
 
 
 
 
 
 
 
 
17ad9a6
 
 
 
ed2eb44
17ad9a6
ed2eb44
17ad9a6
 
ed2eb44
 
 
 
 
 
 
 
17ad9a6
 
 
 
 
 
 
 
ed2eb44
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
eea50e2
ed2eb44
eea50e2
ed2eb44
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
---
title: MLRC-BENCH
emoji: 📊
colorFrom: green
colorTo: blue
sdk: streamlit
sdk_version: 1.39.0
app_file: app.py
pinned: false
license: cc-by-4.0
---

## Overview

This application provides a visual leaderboard for comparing AI model performance on challenging Machine Learning Research Competition problems. It uses Streamlit to create an interactive web interface with filtering options, allowing users to select specific models and tasks for comparison.

The leaderboard uses the MLRC-BENCH benchmark, which measures what percentage of the top human-to-baseline performance gap an agent can close. Success is defined as achieving at least 5% of the margin by which the top human solution surpasses the baseline.

## Installation & Setup

1. Clone the repository
  ```bash
  git clone https://huggingface.co/spaces/launch/MLRC_Bench
  cd MLRC_Bench
  ```

2. Setup virtual env and install the required dependencies
   ```bash
   python -m venv env
   source env/bin/activate
   pip install -r requirements.txt
   ```

3. Run the application
   ```bash
   streamlit run app.py
   ```

### Updating Metrics

To update the table, update the respective metric file in `src/data/metrics` directory

### Updating Text

To update the tab on Benchmark details, make changes to the the following file - `src/components/tasks.py`
To update the metric definitions, make changes to the following file - `src/components/tasks.py`

### Adding New Metrics

To add a new metric:

1. Create a new JSON data file in the `src/data/metrics/` directory (e.g., `src/data/metrics/new_metric.json`)

2. Update `metrics_config` in `src/utils/config.py`:
   ```python
   metrics_config = {
       "Margin to Human": { ... },
       "New Metric Name": {
           "file": "src/data/metrics/new_metric.json",
           "description": "Description of the new metric",
           "min_value": 0,
           "max_value": 100,
           "color_map": "viridis"
       }
   }
   ```

3. Ensure your metric JSON file follows the same format as existing metrics:
   ```json
   {
     "task-name": {
       "model-name-1": value,
       "model-name-2": value
     },
     "another-task": {
       "model-name-1": value,
       "model-name-2": value
     }
   }
   ```

### Adding New Agent Types

To add new agent types:

1. Update `model_categories` in `src/utils/config.py`:
   ```python
   model_categories = {
       "Existing Model": "Category",
       "New Model Name": "New Category"
   }
   ```

## License

[MIT License](LICENSE)