metadata

title: Turkish Mmlu Leaderboard
emoji: 🥇
colorFrom: green
colorTo: indigo
sdk: gradio
app_file: app.py
pinned: true
license: cc-by-nc-4.0
short_description: Leaderboard showcasing Turkish MMLU dataset results.

🏆 Turkish MMLU Leaderboard

A web application for exploring, evaluating, and comparing AI model performance on the Turkish Massive Multitask Language Understanding (MMLU) benchmark.

Features

📊 Interactive leaderboard with filtering capabilities
🔍 Search through model responses
📈 Visualize section-wise performance results
➕ Submit new models for evaluation

Local Development

Prerequisites

Python 3.8+
pip

Installation

Clone the repository:

git clone https://github.com/yourusername/turkish_mmlu_leaderboard.git
cd turkish_mmlu_leaderboard

Install dependencies:
```
pip install -r requirements.txt
```
Run the application:
```
python app.py
```
Open your browser and navigate to http://127.0.0.1:7860

Deploying to Hugging Face Spaces

Option 1: Using the Hugging Face UI

Go to Hugging Face Spaces
Click "Create a new Space"
Select "Gradio" as the SDK
Upload your files or connect to your GitHub repository
The Space will automatically build and deploy

Option 2: Using the Dockerfile

Create a new Space on Hugging Face
Select "Docker" as the SDK
Upload your files including the Dockerfile
The Space will build and deploy using your Dockerfile

Troubleshooting Hugging Face Deployment

If you encounter timeout issues when loading datasets:

Check the Space logs for specific error messages
Increase the timeout values in config.py
Make sure your datasets are accessible from Hugging Face Spaces
Consider using smaller datasets or pre-caching data

Configuration

The application can be configured by modifying the config.py file:

DatasetConfig: Configure dataset paths, cache settings, and refresh intervals
UIConfig: Customize the UI appearance
ModelConfig: Define model-related options

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Start the configuration

Most of the variables to change for a default leaderboard are in src/env.py (replace the path for your leaderboard) and src/about.py (for tasks).

Results files should have the following format and be stored as json files:

{
    "config": {
        "model_dtype": "torch.float16", # or torch.bfloat16 or 8bit or 4bit
        "model_name": "path of the model on the hub: org/model",
        "model_sha": "revision on the hub",
    },
    "results": {
        "task_name": {
            "metric_name": score,
        },
        "task_name2": {
            "metric_name": score,
        }
    }
}

Request files are created automatically by this tool.

If you encounter problem on the space, don't hesitate to restart it to remove the create eval-queue, eval-queue-bk, eval-results and eval-results-bk created folder.

Code logic for more complex edits

You'll find

the main table' columns names and properties in src/display/utils.py
the logic to read all results and request files, then convert them in dataframe lines, in src/leaderboard/read_evals.py, and src/populate.py
the logic to allow or filter submissions in src/submission/submit.py and src/submission/check_validity.py