Spaces:
Running
Running
from datetime import datetime | |
import pytz | |
ABOUT_TEXT = """ | |
## Overview | |
The prompt-video pairs are sourced from [VideoGen-Eval](https://ailab-cvc.github.io/VideoGen-Eval/), a dataset featuring a diverse range of prompts and videos generated by state-of-the-art video diffusion models (VDMs). Our benchmark comprises 26.5k video pairs, each annotated with a corresponding preference label. | |
<img src="https://i.postimg.cc/J7XhVLTh/image.png" alt="Video Duration and Resolution in VideoGen-RewardBench" style="width:400px;"> | |
We report two accuracy metrics: ties-included accuracy **(w/ Ties)** and ties-excluded accuracy **(w/o Ties)**. | |
- For ties-excluded accuracy, we exclude all data labeled as ”ties” and use only data labeled as ”A wins” or ”B wins” for calculation. We compute the rewards for each prompt-video pair, convert the relative reward relationships into binary labels, and calculate classification accuracy. | |
- For ties-included accuracy, we adopt Algorithm 1 proposed by [Ties Matter](https://arxiv.org/pdf/2305.14324). This method traverses all possible tie thresholds, calculates three-class accuracy for each threshold, and selects the highest accuracy as the final metric. See [calc_accuracy](https://github.com/KwaiVGI/VideoAlign/blob/main/calc_accuracy.py#L22) for the implementation of ties-included accuracy. | |
We include multiple types of reward models in this evaluation: | |
1. **Sequence Classifiers** (Seq. Classifier): A model that takes in a prompt and a video and outputs a score. | |
2. **Custom Classifiers**: Research models with different architectures and training objectives. | |
3. **Random**: Random choice baseline. | |
4. **Generative**: Prompting fine-tuned models to choose between two answers. | |
Note: Models with (*) after the Model are independently submitted model scores which have not been verified by the VideoGen-RewardBench team. | |
## Acknowledgments | |
Our leaderboard is built on [RewardBench](https://huggingface.co/spaces/allenai/reward-bench). The prompt-video pairs are sourced from [VideoGen-Eval](https://ailab-cvc.github.io/VideoGen-Eval/). We sincerely thank all the contributors! | |
""" | |
# Get Pacific time zone (handles PST/PDT automatically) | |
pacific_tz = pytz.timezone('America/Los_Angeles') | |
current_time = datetime.now(pacific_tz).strftime("%H:%M %Z, %d %b %Y") | |
TOP_TEXT = f"""# VideoGen-RewardBench: Evaluating Reward Models for Video Generation | |
### Evaluating the capabilities of reward models for video generation. | |
[Code](https://github.com/KwaiVGI/VideoAlign) | [Project](https://gongyeliu.github.io/videoalign/) | [Eval. Dataset](https://huggingface.co/datasets/KwaiVGI/VideoGen-RewardBench) | [Paper](https://arxiv.org/abs/2501.13918) | Total models: {{}} | * Unverified models | ⚠️ Dataset Contamination | Last restart (PST): {current_time} | |
""" | |
SUBMIT_TEXT = r""" | |
## How to Submit Your Results on VideoGen-RewardBench | |
Please follow the steps below to submit your reward model's results: | |
### Step 1: Create an Issue | |
Open an issue in the [VideoAlign GitHub repository](https://github.com/KwaiVGI/VideoAlign/issues). | |
### Step 2: Calculate Accuracy Metrics | |
Use our provided scripts to compute your model's accuracy: | |
- **Ties-Included Accuracy (w/ Ties):** Use [calc_accuracy_with_ties](https://github.com/KwaiVGI/VideoAlign/blob/main/calc_accuracy.py#L22C5-L22C28) | |
- **Ties-Excluded Accuracy (w/o Ties):** Use [calc_accuracy_without_ties](https://github.com/KwaiVGI/VideoAlign/blob/main/calc_accuracy.py#L87) | |
### Step 3: Provide Your Results in the Issue | |
Within the issue, include your reward model's results in JSON format. For example: | |
```json | |
{ | |
"with_tie": { | |
"overall": 61.26, | |
"vq": 59.68, | |
"mq": 66.03, | |
"ta": 53.80 | |
}, | |
"without_tie": { | |
"overall": 73.59, | |
"vq": 75.66, | |
"mq": 74.70, | |
"ta": 72.20 | |
}, | |
"model": "VideoReward", | |
"model_link": "https://huggingface.co/KwaiVGI/VideoReward", | |
"model_type": "Seq. Classifiers" | |
} | |
``` | |
Additionally, please include any relevant information about your model (e.g., a brief description, methodology, etc.). | |
### Step 4: Review and Leaderboard Update | |
We will review your issue promptly and update the leaderboard accordingly. | |
""" |