DataEngEval / README_HF_SPACES.md
uparekh01151's picture
Initial commit for DataEngEval
acd8e16
|
raw
history blame
5.23 kB

Hugging Face Spaces Deployment Guide

This guide explains how to deploy the NL→SQL Leaderboard on Hugging Face Spaces.

πŸš€ Quick Deployment

Step 1: Create a New Space

  1. Go to Hugging Face Spaces
  2. Click "Create new Space"
  3. Fill in the details:
    • Space name: DataEngEval (or your preferred name)
    • License: Choose appropriate license
    • Visibility: Public or Private
    • SDK: Gradio
    • Hardware: CPU Basic (sufficient for this app)

Step 2: Upload Your Code

Option A: Git Clone and Push

# Clone your repository
git clone <your-repo-url>
cd dataeng-leaderboard

# Add Hugging Face Space as remote
git remote add hf https://huggingface.co/spaces/your-username/DataEngEval

# Push to Hugging Face
git push hf main

Option B: Direct Upload

  1. Upload all files to your Space using the web interface
  2. Make sure to include all files from the project structure

Step 3: Configure Environment (Optional)

  1. Go to your Space settings
  2. Add secrets if needed:
    • HF_TOKEN: Your Hugging Face API token (for real model inference)
  3. The app will work without tokens using mock mode

Step 4: Deploy

The Space will automatically build and deploy. You'll see the URL once ready.

πŸ“ Required Files for Deployment

Make sure these files are present in your Space:

β”œβ”€β”€ app.py                     # βœ… Main application
β”œβ”€β”€ requirements.txt           # βœ… Dependencies
β”œβ”€β”€ config/
β”‚   └── models.yaml           # βœ… Model configurations
β”œβ”€β”€ src/
β”‚   β”œβ”€β”€ evaluator.py          # βœ… Evaluation logic
β”‚   β”œβ”€β”€ models_registry.py    # βœ… Model interfaces
β”‚   └── scoring.py            # βœ… Scoring logic
β”œβ”€β”€ tasks/                    # βœ… Datasets
β”‚   β”œβ”€β”€ nyc_taxi_small/
β”‚   β”œβ”€β”€ tpch_tiny/
β”‚   └── ecommerce_orders_small/
β”œβ”€β”€ prompts/                  # βœ… SQL templates
β”‚   β”œβ”€β”€ template_presto.txt
β”‚   β”œβ”€β”€ template_bigquery.txt
β”‚   └── template_snowflake.txt
└── README.md                 # βœ… Documentation

πŸ”§ Configuration

Model Configuration

Edit config/models.yaml to add/remove models:

models:
  - name: "Your Model"
    provider: "huggingface"
    model_id: "your/model-id"
    params:
      max_new_tokens: 256
      temperature: 0.1
    description: "Your model description"

Environment Variables

Set these in your Space settings:

  • HF_TOKEN: Hugging Face API token (optional)
  • MOCK_MODE: Set to "true" to force mock mode

πŸš€ Features

Automatic Features

  • Auto-deployment: Changes pushed to Git trigger automatic rebuilds
  • Persistent storage: Leaderboard results persist across deployments
  • Mock mode: Works without API keys for demos
  • Remote inference: No heavy model downloads

Performance Optimizations

  • Lightweight dependencies
  • Remote model inference
  • Efficient DuckDB execution
  • Minimal memory footprint

πŸ› Troubleshooting

Common Issues

Build fails: Check that all required files are present and requirements.txt is correct

App doesn't start: Verify app.py is in the root directory

Models not working: Check config/models.yaml format and model IDs

Datasets not loading: Ensure all dataset files are in tasks/ directory

Debug Mode

To debug locally before deploying:

# Install dependencies
pip install -r requirements.txt

# Run locally
gradio app.py

# Test with mock mode
export MOCK_MODE=true
gradio app.py

πŸ“Š Monitoring

Space Logs

  • Check the "Logs" tab in your Space for runtime errors
  • Monitor memory usage in the "Settings" tab

Performance

  • CPU usage should be minimal (remote inference)
  • Memory usage should be low (no local models)
  • Response times depend on Hugging Face Inference API

πŸ”„ Updates

Updating Your Space

  1. Make changes to your code
  2. Commit and push to your Space's Git repository
  3. The Space will automatically rebuild

Adding New Models

  1. Edit config/models.yaml
  2. Push changes to your Space
  3. New models will be available immediately

Adding New Datasets

  1. Create new folder in tasks/
  2. Add required files (schema.sql, loader.py, cases.yaml)
  3. Push changes to your Space

🎯 Best Practices

Code Organization

  • Keep all source code in src/ directory
  • Use relative imports
  • Minimize dependencies in requirements.txt

Performance

  • Use Hugging Face Inference API for models
  • Avoid local model loading
  • Keep datasets small for faster evaluation

User Experience

  • Provide clear error messages
  • Use mock mode for demos
  • Include comprehensive documentation

πŸ“š Additional Resources

πŸ†˜ Support

If you encounter issues:

  1. Check the Space logs for errors
  2. Verify all required files are present
  3. Test locally before deploying
  4. Check Hugging Face Spaces status page
  5. Review the troubleshooting section above