MLRC_Bench

Running

App Files Files Community

Armeddinosaur commited on Mar 30

Commit

17ad9a6

1 Parent(s): cf2253a

Updating readme

Browse files

Files changed (1) hide show

README.md +15 -131

README.md CHANGED Viewed

@@ -16,90 +16,18 @@ This application provides a visual leaderboard for comparing AI model performanc
 The leaderboard uses the MLRC-BENCH benchmark, which measures what percentage of the top human-to-baseline performance gap an agent can close. Success is defined as achieving at least 5% of the margin by which the top human solution surpasses the baseline.
-### Key Features
-- **Interactive Filtering**: Select specific model types and tasks to focus on
-- **Customizable Metrics**: Compare models using "Margin to Human" performance scores
-- **Hierarchical Table Display**: Fixed columns with scrollable metrics section
-- **Conditional Formatting**: Visual indicators for positive/negative values
-- **Model Type Color Coding**: Different colors for Open Source, Open Weights, and Closed Source models
-- **Medal Indicators**: Top-ranked models receive gold, silver, and bronze medals
-- **Task Descriptions**: Detailed explanations of what each task measures
-## Project Structure
-The codebase follows a modular architecture for improved maintainability and separation of concerns:
-```
-app.py (main entry point)
-├── requirements.txt
-└── src/
-    ├── app.py (main application logic)
-    ├── components/
-    │   ├── header.py (header and footer components)
-    │   ├── filters.py (filter selection components)
-    │   ├── leaderboard.py (leaderboard table component)
-    │   └── tasks.py (task descriptions component)
-    ├── data/
-    │   ├── processors.py (data processing utilities)
-    │   └── metrics/
-    │       └── margin_to_human.json (metric data file)
-    ├── styles/
-    │   ├── base.py (combined styles)
-    │   ├── components.py (component styling)
-    │   ├── tables.py (table-specific styling)
-    │   └── theme.py (theme definitions)
-    └── utils/
-        ├── config.py (configuration settings)
-        └── data_loader.py (data loading utilities)
-```
-### Module Descriptions
-#### Core Files
-- `app.py` (root): Simple entry point that imports and calls the main function
-- `src/app.py`: Main application logic, coordinates the overall flow
-#### Components
-- `header.py`: Manages the page header, section headers, and footer components
-- `filters.py`: Handles metric, task, and model type selection interfaces
-- `leaderboard.py`: Renders the custom HTML leaderboard table
-- `tasks.py`: Renders the task descriptions section
-#### Data Processing
-- `processors.py`: Contains utilities for data formatting and styling
-- `data_loader.py`: Functions for loading and processing metric data
-#### Styling
-- `theme.py`: Base theme definitions and color schemes
-- `components.py`: Styling for UI components (buttons, cards, etc.)
-- `tables.py`: Styling for tables and data displays
-- `base.py`: Combines all styles for application-wide use
-#### Configuration
-- `config.py`: Contains all configuration settings including themes, metrics, and model categorizations
-## Benefits of Modular Architecture
-The modular structure provides several advantages:
-1. **Improved Code Organization**: Code is logically separated based on functionality
-2. **Better Separation of Concerns**: Each module has a clear, single responsibility
-3. **Enhanced Maintainability**: Changes to one aspect don't require modifying the entire codebase
-4. **Simplified Testing**: Components can be tested independently
-5. **Easier Collaboration**: Multiple developers can work on different parts simultaneously
-6. **Cleaner Entry Point**: Main app file is simple and focused
 ## Installation & Setup
 1. Clone the repository
-   ```bash
-   git clone <repository-url>
-   cd model-capability-leaderboard
-   ```
-2. Install the required dependencies
    ```bash
    pip install -r requirements.txt
    ```
@@ -108,7 +36,14 @@ The modular structure provides several advantages:
    streamlit run app.py
    ```
-## Extending the Application
 ### Adding New Metrics
@@ -156,57 +91,6 @@ To add new model types:
    }
    ```
-### Modifying the UI Theme
-To change the theme colors:
-1. Update the `dark_theme` dictionary in `src/utils/config.py`
-### Adding New Components
-To add new visualization components:
-1. Create a new file in the `src/components/` directory
-2. Import and use the component in `src/app.py`
-## Data Format
-The application uses JSON files for metric data. The expected format is:
-```json
-{
-  "task-name": {
-    "model-name-1": value,
-    "model-name-2": value
-  },
-  "another-task": {
-    "model-name-1": value,
-    "model-name-2": value
-  }
-}
-```
-## Testing
-This modular structure makes it easier to write focused unit tests:
-```python
-# Example test for data_loader.py
-def test_process_data():
-    test_data = {"task": {"model": 0.5}}
-    df = process_data(test_data)
-    assert "Task" in df.columns
-    assert df.loc["model", "Task"] == 0.5
-```
 ## License
 [MIT License](LICENSE)
-## Contributing
-Contributions are welcome! Please feel free to submit a Pull Request.
-## Contact
-For any questions or feedback, please contact [[email protected]](mailto:[email protected]).

 The leaderboard uses the MLRC-BENCH benchmark, which measures what percentage of the top human-to-baseline performance gap an agent can close. Success is defined as achieving at least 5% of the margin by which the top human solution surpasses the baseline.
 ## Installation & Setup
 1. Clone the repository
+  ```bash
+  git clone https://huggingface.co/spaces/launch/MLRC_Bench
+  cd MLRC_Bench
+  ```
+2. Setup virtual env and install the required dependencies
    ```bash
+   python -m venv env
+   source env/bin/activate
    pip install -r requirements.txt
    ```
    streamlit run app.py
    ```
+### Updating Metrics
+To update the table, update the respective metric file in `src/data/metrics` directory
+### Updating Text
+To update the tab on Benchmark details, make changes to the the following file - `src/components/tasks.py`
+To update the metric definitions, make changes to the following file - `src/components/tasks.py`
 ### Adding New Metrics
    }
    ```
 ## License
 [MIT License](LICENSE)