|
--- |
|
title: CLIP Model Evaluation |
|
emoji: 📊🚀 |
|
colorFrom: blue |
|
colorTo: green |
|
sdk: gradio |
|
sdk_version: 5.31.0 |
|
app_file: app.py |
|
pinned: false |
|
license: apache-2.0 |
|
--- |
|
|
|
# 📊 CLIP Model Evaluation Space |
|
|
|
This Space provides an interactive interface to evaluate the performance of various CLIP (Contrastive Language-Image Pre-Training) models on standard image-text retrieval benchmarks. |
|
|
|
It calculates Recall@K (R@1, R@5, R@10) metrics for both: |
|
* **Image Retrieval (Text-to-Image):** Given a text query, how well does the model retrieve the correct image? |
|
* **Text Retrieval (Image-to-Text):** Given an image query, how well does the model retrieve the correct text description? |
|
|
|
A higher Recall@1 means the model is better at placing the correct item at the very top of the results. |
|
|
|
## 🚀 How to Use |
|
|
|
1. **Select a CLIP Model:** Choose a pre-trained CLIP model from the dropdown menu. |
|
2. **Select a Dataset:** Choose the dataset you want to evaluate on (e.g., "mscoco", "flickr"). |
|
3. **Number of Samples:** Specify the number of image-text pairs from the dataset to use for the evaluation. Using fewer samples will be faster but less representative. |
|
4. **Click "Evaluate Model":** The evaluation will run, and the Recall@K metrics will be displayed. |
|
|
|
## 🛠️ Under the Hood |
|
|
|
This Space uses the `evaluate` library from Hugging Face and a custom metric script (`clipmodel_eval.py`) to perform the CLIP model evaluations. The models and datasets are loaded from the Hugging Face Hub. |