alibayram's picture
Implement robust data loading with retry logic, enhance error handling in Gradio app, and improve user experience with fallback data for leaderboard and responses. Update configuration for request timeouts and retries.
3ce2f84

A newer version of the Gradio SDK is available: 5.23.2

Upgrade
metadata
title: Turkish Mmlu Leaderboard
emoji: πŸ₯‡
colorFrom: green
colorTo: indigo
sdk: gradio
app_file: app.py
pinned: true
license: cc-by-nc-4.0
short_description: Leaderboard showcasing Turkish MMLU dataset results.

πŸ† Turkish MMLU Leaderboard

A web application for exploring, evaluating, and comparing AI model performance on the Turkish Massive Multitask Language Understanding (MMLU) benchmark.

Features

  • πŸ“Š Interactive leaderboard with filtering capabilities
  • πŸ” Search through model responses
  • πŸ“ˆ Visualize section-wise performance results
  • βž• Submit new models for evaluation

Local Development

Prerequisites

  • Python 3.8+
  • pip

Installation

  1. Clone the repository:

    git clone https://github.com/yourusername/turkish_mmlu_leaderboard.git
    cd turkish_mmlu_leaderboard
    
  2. Install dependencies:

    pip install -r requirements.txt
    
  3. Run the application:

    python app.py
    
  4. Open your browser and navigate to http://127.0.0.1:7860

Deploying to Hugging Face Spaces

Option 1: Using the Hugging Face UI

  1. Go to Hugging Face Spaces
  2. Click "Create a new Space"
  3. Select "Gradio" as the SDK
  4. Upload your files or connect to your GitHub repository
  5. The Space will automatically build and deploy

Option 2: Using the Dockerfile

  1. Create a new Space on Hugging Face
  2. Select "Docker" as the SDK
  3. Upload your files including the Dockerfile
  4. The Space will build and deploy using your Dockerfile

Troubleshooting Hugging Face Deployment

If you encounter timeout issues when loading datasets:

  1. Check the Space logs for specific error messages
  2. Increase the timeout values in config.py
  3. Make sure your datasets are accessible from Hugging Face Spaces
  4. Consider using smaller datasets or pre-caching data

Configuration

The application can be configured by modifying the config.py file:

  • DatasetConfig: Configure dataset paths, cache settings, and refresh intervals
  • UIConfig: Customize the UI appearance
  • ModelConfig: Define model-related options

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Start the configuration

Most of the variables to change for a default leaderboard are in src/env.py (replace the path for your leaderboard) and src/about.py (for tasks).

Results files should have the following format and be stored as json files:

{
    "config": {
        "model_dtype": "torch.float16", # or torch.bfloat16 or 8bit or 4bit
        "model_name": "path of the model on the hub: org/model",
        "model_sha": "revision on the hub",
    },
    "results": {
        "task_name": {
            "metric_name": score,
        },
        "task_name2": {
            "metric_name": score,
        }
    }
}

Request files are created automatically by this tool.

If you encounter problem on the space, don't hesitate to restart it to remove the create eval-queue, eval-queue-bk, eval-results and eval-results-bk created folder.

Code logic for more complex edits

You'll find

  • the main table' columns names and properties in src/display/utils.py
  • the logic to read all results and request files, then convert them in dataframe lines, in src/leaderboard/read_evals.py, and src/populate.py
  • the logic to allow or filter submissions in src/submission/submit.py and src/submission/check_validity.py