V-STaR commited on
Commit
84ea603
·
verified ·
1 Parent(s): 93a22bf

Update constants.py

Browse files
Files changed (1) hide show
  1. constants.py +10 -22
constants.py CHANGED
@@ -181,48 +181,36 @@ COLUMN_NAMES_I2V = MODEL_INFO_TAB_I2V + I2V_TAB
181
  LEADERBORAD_INTRODUCTION = """# V-STaR Leaderboard
182
 
183
  *"Can Video-LLMs ``reason through a sequential spatio-temporal logic” in videos?"*
184
- 🏆 Welcome to the leaderboard of the **V-STaR**! 🎦 *A Comprehensive Benchmark Suite for Video-LLMs* [![Code](https://img.shields.io/github/stars/Vchitect/VBench.svg?style=social&label=Official)](https://github.com/V-STaR-Bench/V-STaR)
185
  <div style="display: flex; flex-wrap: wrap; align-items: center; gap: 10px;">
186
- <a href='https://arxiv.org/abs/2311.17982'><img src='https://img.shields.io/badge/cs.CV-Paper-b31b1b?logo=arxiv&logoColor=red'></a>
187
  <a href='https://v-star-bench.github.io/'><img src='https://img.shields.io/badge/VBench-Website-green?logo=googlechrome&logoColor=green'></a>
188
- <a href='https://pypi.org/project/vbench/'><img src='https://img.shields.io/pypi/v/vbench'></a>
189
- <a href='https://www.youtube.com/watch?v=7IhCC8Qqn8Y'><img src='https://img.shields.io/badge/YouTube-Video-c4302b?logo=youtube&logoColor=red'></a>
190
- <a href='https://hits.seeyoufarm.com'><img src='https://hits.seeyoufarm.com/api/count/incr/badge.svg?url=https%3A%2F%2Fgithub.com%2FVchitect%2FVBench&count_bg=%23FFA500&title_bg=%23555555&icon=&icon_color=%23E7E7E7&title=visitors&edge_flat=false'></a>
191
  </div>
192
 
193
- - **Comprehensive Dimensions:** We carefully decompose video generation quality into 16 comprehensive dimensions to reveal individual model's strengths and weaknesses.
194
- - **Human Alignment:** We conducted extensive experiments and human annotations to validate robustness of VBench.
195
- - **Valuable Insights:** VBench provides multi-perspective insights useful for the community.
196
 
197
  **Join Leaderboard**: Please see the [instructions](https://github.com/Vchitect/VBench/tree/master?tab=readme-ov-file#trophy-leaderboard) for 3 options to participate. One option is to follow [VBench Usage info](https://github.com/Vchitect/VBench?tab=readme-ov-file#usage), and upload the generated `result.json` file here. After clicking the `Submit here!` button, click the `Refresh` button.
198
- **Model Information**: What are the details of these Video Generation Models? See [HERE](https://github.com/Vchitect/VBench/tree/master/sampled_videos#what-are-the-details-of-the-video-generation-models)
199
 
200
- **Credits**: This leaderboard is updated and maintained by the team of [VBench Contributors](https://github.com/Vchitect/VBench?tab=readme-ov-file#muscle-vbench-contributors).
201
  """
202
 
203
  SUBMIT_INTRODUCTION = """# Submit on VBench Benchmark Introduction
204
  ## 🎈
205
- 1. Please note that you need to obtain the file `evaluation_results/*.json` by running VBench in Github. You may conduct an [Offline Check](https://github.com/Vchitect/VBench?tab=readme-ov-file#get-final-score-and-submit-to-leaderboard) before uploading.
206
- 2. Then, pack these JSON files into a `ZIP` archive, ensuring that the top-level directory of the ZIP contains the individual JSON files.
207
- 3. Finally, upload the ZIP archive below.
208
- ⚠️ Uploading generated videos or images of the model is invalid!
209
- ⚠️ Submissions that do not correctly fill in the model name and model link may be deleted by the VBench team. The contact information you filled in will not be made public.
210
  """
211
 
212
  TABLE_INTRODUCTION = """
213
  """
214
 
215
  LEADERBORAD_INFO = """
216
- VBench, a comprehensive benchmark suite for video generative models. We design a comprehensive and hierarchical Evaluation Dimension Suite to decompose "video generation quality" into multiple well-defined dimensions to facilitate fine-grained and objective evaluation. For each dimension and each content category, we carefully design a Prompt Suite as test cases, and sample Generated Videos from a set of video generation models. For each evaluation dimension, we specifically design an Evaluation Method Suite, which uses carefully crafted method or designated pipeline for automatic objective evaluation. We also conduct Human Preference Annotation for the generated videos for each dimension, and show that VBench evaluation results are well aligned with human perceptions. VBench can provide valuable insights from multiple perspectives.
217
  """
218
 
219
  CITATION_BUTTON_LABEL = "Copy the following snippet to cite these results"
220
- CITATION_BUTTON_TEXT = r"""@inproceedings{huang2023vbench,
221
- title={{VBench}: Comprehensive Benchmark Suite for Video Generative Models},
222
- author={Huang, Ziqi and He, Yinan and Yu, Jiashuo and Zhang, Fan and Si, Chenyang and Jiang, Yuming and Zhang, Yuanhan and Wu, Tianxing and Jin, Qingyang and Chanpaisit, Nattapol and Wang, Yaohui and Chen, Xinyuan and Wang, Limin and Lin, Dahua and Qiao, Yu and Liu, Ziwei},
223
- booktitle={Proceedings of the IEEE/CVF conference on computer vision and pattern recognition},
224
- year={2024}
225
- }"""
226
 
227
  QUALITY_CLAIM_TEXT = "We use all the videos on Sora website (https://openai.com/sora) for a preliminary evaluation, including the failure case videos Sora provided."
228
 
 
181
  LEADERBORAD_INTRODUCTION = """# V-STaR Leaderboard
182
 
183
  *"Can Video-LLMs ``reason through a sequential spatio-temporal logic” in videos?"*
184
+ 🏆 Welcome to the leaderboard of the **V-STaR**! 🎦 *A spatio-temporal reasoning benchmark for Video-LLMs* #[![Code](https://img.shields.io/github/stars/V-STaR-Bench/V-STaR.svg?style=social&label=Official)](https://github.com/V-STaR-Bench/V-STaR)
185
  <div style="display: flex; flex-wrap: wrap; align-items: center; gap: 10px;">
186
+ <a href=''><img src='https://img.shields.io/badge/cs.CV-Paper-b31b1b?logo=arxiv&logoColor=red'></a>
187
  <a href='https://v-star-bench.github.io/'><img src='https://img.shields.io/badge/VBench-Website-green?logo=googlechrome&logoColor=green'></a>
 
 
 
188
  </div>
189
 
190
+ - **Comprehensive Dimensions:** We evaluate Video-LLM’s spatio-temporal reasoning ability in answering questions explicitly in the context of “when”, “where”, and “what”.
191
+ - **Human Alignment:** We conducted extensive experiments and human annotations to validate robustness of V-STaR.
192
+ - **Valuable Insights:** V-STaR reveals a fundamental weakness in existing Video-LLMs regarding causal spatio-temporal reasoning.
193
 
194
  **Join Leaderboard**: Please see the [instructions](https://github.com/Vchitect/VBench/tree/master?tab=readme-ov-file#trophy-leaderboard) for 3 options to participate. One option is to follow [VBench Usage info](https://github.com/Vchitect/VBench?tab=readme-ov-file#usage), and upload the generated `result.json` file here. After clicking the `Submit here!` button, click the `Refresh` button.
 
195
 
196
+ **Credits**: This leaderboard is updated and maintained by the team of [V-STaR Contributors]().
197
  """
198
 
199
  SUBMIT_INTRODUCTION = """# Submit on VBench Benchmark Introduction
200
  ## 🎈
201
+ ⚠️ Please note that you need to obtain the file `results/*.json` by running V-STaR in Github. You may conduct an [Offline Eval]() before submitting.
202
+ ⚠️ Then, please contact us for updating your results via [email1]([email protected]) or [email2](hu.[email protected]).
 
 
 
203
  """
204
 
205
  TABLE_INTRODUCTION = """
206
  """
207
 
208
  LEADERBORAD_INFO = """
209
+ V-STaR, a comprehensive spatio-temporal reasoning benchmark for video large language models (Video-LLMs). We construct a fine-grained reasoning dataset with coarse-to-fine CoT questions, enabling a structured evaluation of spatio-temporal reasoning. Specifically, we introduce a Reverse Spatio-Temporal Reasoning (RSTR) task to quantify models’ spatio-temporal reasoning ability. For each dimension and each content category, we carefully design a Prompt Suite as test cases, and sample Generated Videos from a set of video generation models. Experiments on V-STaR reveal although many models perform well on “what”, some struggle to ground their answers in time and location. This finding highlights a fundamental weakness in existing Video-LLMs regarding causal spatio-temporal reasoning and inspires research in improving trustworthy spatio-temporal understanding in future Video-LLMs.
210
  """
211
 
212
  CITATION_BUTTON_LABEL = "Copy the following snippet to cite these results"
213
+ CITATION_BUTTON_TEXT = r"""to be updated"""
 
 
 
 
 
214
 
215
  QUALITY_CLAIM_TEXT = "We use all the videos on Sora website (https://openai.com/sora) for a preliminary evaluation, including the failure case videos Sora provided."
216