DataEngEval

Sleeping

App Files Files Community

DataEngEval / README_HF_SPACES.md

uparekh01151

Initial commit for DataEngEval

acd8e16 about 1 month ago

preview code

raw

history blame

5.23 kB

Hugging Face Spaces Deployment Guide

This guide explains how to deploy the NL→SQL Leaderboard on Hugging Face Spaces.

🚀 Quick Deployment

Step 1: Create a New Space

Go to Hugging Face Spaces
Click "Create new Space"
Fill in the details:
- Space name: DataEngEval (or your preferred name)
- License: Choose appropriate license
- Visibility: Public or Private
- SDK: Gradio
- Hardware: CPU Basic (sufficient for this app)

Step 2: Upload Your Code

Option A: Git Clone and Push

# Clone your repository
git clone <your-repo-url>
cd dataeng-leaderboard

# Add Hugging Face Space as remote
git remote add hf https://huggingface.co/spaces/your-username/DataEngEval

# Push to Hugging Face
git push hf main

Option B: Direct Upload

Upload all files to your Space using the web interface
Make sure to include all files from the project structure

Step 3: Configure Environment (Optional)

Go to your Space settings
Add secrets if needed:
- HF_TOKEN: Your Hugging Face API token (for real model inference)
The app will work without tokens using mock mode

Step 4: Deploy

The Space will automatically build and deploy. You'll see the URL once ready.

📁 Required Files for Deployment

Make sure these files are present in your Space:

├── app.py                     # ✅ Main application
├── requirements.txt           # ✅ Dependencies
├── config/
│   └── models.yaml           # ✅ Model configurations
├── src/
│   ├── evaluator.py          # ✅ Evaluation logic
│   ├── models_registry.py    # ✅ Model interfaces
│   └── scoring.py            # ✅ Scoring logic
├── tasks/                    # ✅ Datasets
│   ├── nyc_taxi_small/
│   ├── tpch_tiny/
│   └── ecommerce_orders_small/
├── prompts/                  # ✅ SQL templates
│   ├── template_presto.txt
│   ├── template_bigquery.txt
│   └── template_snowflake.txt
└── README.md                 # ✅ Documentation

🔧 Configuration

Model Configuration

Edit config/models.yaml to add/remove models:

models:
  - name: "Your Model"
    provider: "huggingface"
    model_id: "your/model-id"
    params:
      max_new_tokens: 256
      temperature: 0.1
    description: "Your model description"

Environment Variables

Set these in your Space settings:

HF_TOKEN: Hugging Face API token (optional)
MOCK_MODE: Set to "true" to force mock mode

🚀 Features

Automatic Features

Auto-deployment: Changes pushed to Git trigger automatic rebuilds
Persistent storage: Leaderboard results persist across deployments
Mock mode: Works without API keys for demos
Remote inference: No heavy model downloads

Performance Optimizations

Lightweight dependencies
Remote model inference
Efficient DuckDB execution
Minimal memory footprint

🐛 Troubleshooting

Common Issues

Build fails: Check that all required files are present and requirements.txt is correct

App doesn't start: Verify app.py is in the root directory

Models not working: Check config/models.yaml format and model IDs

Datasets not loading: Ensure all dataset files are in tasks/ directory

Debug Mode

To debug locally before deploying:

# Install dependencies
pip install -r requirements.txt

# Run locally
gradio app.py

# Test with mock mode
export MOCK_MODE=true
gradio app.py

📊 Monitoring

Space Logs

Check the "Logs" tab in your Space for runtime errors
Monitor memory usage in the "Settings" tab

Performance

CPU usage should be minimal (remote inference)
Memory usage should be low (no local models)
Response times depend on Hugging Face Inference API

🔄 Updates

Updating Your Space

Make changes to your code
Commit and push to your Space's Git repository
The Space will automatically rebuild

Adding New Models

Edit config/models.yaml
Push changes to your Space
New models will be available immediately

Adding New Datasets

Create new folder in tasks/
Add required files (schema.sql, loader.py, cases.yaml)
Push changes to your Space

🎯 Best Practices

Code Organization

Keep all source code in src/ directory
Use relative imports
Minimize dependencies in requirements.txt

Performance

Use Hugging Face Inference API for models
Avoid local model loading
Keep datasets small for faster evaluation

User Experience

Provide clear error messages
Use mock mode for demos
Include comprehensive documentation

📚 Additional Resources

🆘 Support

If you encounter issues:

Check the Space logs for errors
Verify all required files are present
Test locally before deploying
Check Hugging Face Spaces status page
Review the troubleshooting section above