Spaces:

V-STaR-Bench
/

V-STaR-LeaderBoard

Running

App Files Files Community

V-STaR commited on Mar 15

Commit

84ea603

verified ·

1 Parent(s): 93a22bf

Update constants.py

Browse files

Files changed (1) hide show

constants.py +10 -22

constants.py CHANGED Viewed

@@ -181,48 +181,36 @@ COLUMN_NAMES_I2V = MODEL_INFO_TAB_I2V + I2V_TAB
 LEADERBORAD_INTRODUCTION = """# V-STaR Leaderboard
     *"Can Video-LLMs ``reason through a sequential spatio-temporal logic” in videos?"*
-    🏆 Welcome to the leaderboard of the **V-STaR**! 🎦 *A Comprehensive Benchmark Suite for Video-LLMs*   [![Code](https://img.shields.io/github/stars/Vchitect/VBench.svg?style=social&label=Official)](https://github.com/V-STaR-Bench/V-STaR)
     <div style="display: flex; flex-wrap: wrap; align-items: center; gap: 10px;">
-    <a href='https://arxiv.org/abs/2311.17982'><img src='https://img.shields.io/badge/cs.CV-Paper-b31b1b?logo=arxiv&logoColor=red'></a>
     <a href='https://v-star-bench.github.io/'><img src='https://img.shields.io/badge/VBench-Website-green?logo=googlechrome&logoColor=green'></a>
-    <a href='https://pypi.org/project/vbench/'><img src='https://img.shields.io/pypi/v/vbench'></a>
-    <a href='https://www.youtube.com/watch?v=7IhCC8Qqn8Y'><img src='https://img.shields.io/badge/YouTube-Video-c4302b?logo=youtube&logoColor=red'></a>
-    <a href='https://hits.seeyoufarm.com'><img src='https://hits.seeyoufarm.com/api/count/incr/badge.svg?url=https%3A%2F%2Fgithub.com%2FVchitect%2FVBench&count_bg=%23FFA500&title_bg=%23555555&icon=&icon_color=%23E7E7E7&title=visitors&edge_flat=false'></a>
     </div>
-    - **Comprehensive Dimensions:** We carefully decompose video generation quality into 16 comprehensive dimensions to reveal individual model's strengths and weaknesses.
-    - **Human Alignment:** We conducted extensive experiments and human annotations to validate robustness of VBench.
-    - **Valuable Insights:** VBench provides multi-perspective insights useful for the community.
     **Join Leaderboard**: Please see the [instructions](https://github.com/Vchitect/VBench/tree/master?tab=readme-ov-file#trophy-leaderboard) for 3 options to participate. One option is to follow [VBench Usage info](https://github.com/Vchitect/VBench?tab=readme-ov-file#usage), and upload the generated `result.json` file here. After clicking the `Submit here!` button, click the `Refresh` button.
-    **Model Information**: What are the details of these Video Generation Models? See [HERE](https://github.com/Vchitect/VBench/tree/master/sampled_videos#what-are-the-details-of-the-video-generation-models)
-    **Credits**: This leaderboard is updated and maintained by the team of [VBench Contributors](https://github.com/Vchitect/VBench?tab=readme-ov-file#muscle-vbench-contributors).
     """
 SUBMIT_INTRODUCTION = """# Submit on VBench Benchmark Introduction
 ## 🎈
-1. Please note that you need to obtain the file `evaluation_results/*.json` by running VBench in Github. You may conduct an [Offline Check](https://github.com/Vchitect/VBench?tab=readme-ov-file#get-final-score-and-submit-to-leaderboard) before uploading.
-2. Then, pack these JSON files into a `ZIP` archive, ensuring that the top-level directory of the ZIP contains the individual JSON files.
-3. Finally, upload the ZIP archive below.
-⚠️ Uploading generated videos or images of the model is invalid!
-⚠️ Submissions that do not correctly fill in the model name and model link may be deleted by the VBench team. The contact information you filled in will not be made public.
 """
 TABLE_INTRODUCTION = """
     """
 LEADERBORAD_INFO = """
-       VBench, a comprehensive benchmark suite for video generative models. We design a comprehensive and hierarchical Evaluation Dimension Suite to decompose "video generation quality" into multiple well-defined dimensions to facilitate fine-grained and objective evaluation. For each dimension and each content category, we carefully design a Prompt Suite as test cases, and sample Generated Videos from a set of video generation models. For each evaluation dimension, we specifically design an Evaluation Method Suite, which uses carefully crafted method or designated pipeline for automatic objective evaluation. We also conduct Human Preference Annotation for the generated videos for each dimension, and show that VBench evaluation results are well aligned with human perceptions. VBench can provide valuable insights from multiple perspectives.
 """
 CITATION_BUTTON_LABEL = "Copy the following snippet to cite these results"
-CITATION_BUTTON_TEXT = r"""@inproceedings{huang2023vbench,
-     title={{VBench}: Comprehensive Benchmark Suite for Video Generative Models},
-     author={Huang, Ziqi and He, Yinan and Yu, Jiashuo and Zhang, Fan and Si, Chenyang and Jiang, Yuming and Zhang, Yuanhan and Wu, Tianxing and Jin, Qingyang and Chanpaisit, Nattapol and Wang, Yaohui and Chen, Xinyuan and Wang, Limin and Lin, Dahua and Qiao, Yu and Liu, Ziwei},
-     booktitle={Proceedings of the IEEE/CVF conference on computer vision and pattern recognition},
-     year={2024}
-}"""
 QUALITY_CLAIM_TEXT = "We use all the videos on Sora website (https://openai.com/sora) for a preliminary evaluation, including the failure case videos Sora provided."

 LEADERBORAD_INTRODUCTION = """# V-STaR Leaderboard
     *"Can Video-LLMs ``reason through a sequential spatio-temporal logic” in videos?"*
+    🏆 Welcome to the leaderboard of the **V-STaR**! 🎦 *A spatio-temporal reasoning benchmark for Video-LLMs*   #[![Code](https://img.shields.io/github/stars/V-STaR-Bench/V-STaR.svg?style=social&label=Official)](https://github.com/V-STaR-Bench/V-STaR)
     <div style="display: flex; flex-wrap: wrap; align-items: center; gap: 10px;">
+    <a href=''><img src='https://img.shields.io/badge/cs.CV-Paper-b31b1b?logo=arxiv&logoColor=red'></a>
     <a href='https://v-star-bench.github.io/'><img src='https://img.shields.io/badge/VBench-Website-green?logo=googlechrome&logoColor=green'></a>
     </div>
+    - **Comprehensive Dimensions:** We evaluate Video-LLM’s spatio-temporal reasoning ability in answering questions explicitly in the context of “when”, “where”, and “what”.
+    - **Human Alignment:** We conducted extensive experiments and human annotations to validate robustness of V-STaR.
+    - **Valuable Insights:** V-STaR reveals a fundamental weakness in existing Video-LLMs regarding causal spatio-temporal reasoning.
     **Join Leaderboard**: Please see the [instructions](https://github.com/Vchitect/VBench/tree/master?tab=readme-ov-file#trophy-leaderboard) for 3 options to participate. One option is to follow [VBench Usage info](https://github.com/Vchitect/VBench?tab=readme-ov-file#usage), and upload the generated `result.json` file here. After clicking the `Submit here!` button, click the `Refresh` button.
+    **Credits**: This leaderboard is updated and maintained by the team of [V-STaR Contributors]().
     """
 SUBMIT_INTRODUCTION = """# Submit on VBench Benchmark Introduction
 ## 🎈
+⚠️ Please note that you need to obtain the file `results/*.json` by running V-STaR in Github. You may conduct an [Offline Eval]() before submitting.
+⚠️ Then, please contact us for updating your results via [email1]([email protected]) or [email2](hu.[email protected]).
 """
 TABLE_INTRODUCTION = """
     """
 LEADERBORAD_INFO = """
+       V-STaR, a comprehensive spatio-temporal reasoning benchmark for video large language models (Video-LLMs). We construct a fine-grained reasoning dataset with coarse-to-fine CoT questions, enabling a structured evaluation of spatio-temporal reasoning. Specifically, we introduce a Reverse Spatio-Temporal Reasoning (RSTR) task to quantify models’ spatio-temporal reasoning ability. For each dimension and each content category, we carefully design a Prompt Suite as test cases, and sample Generated Videos from a set of video generation models. Experiments on V-STaR reveal although many models perform well on “what”, some struggle to ground their answers in time and location. This finding highlights a fundamental weakness in existing Video-LLMs regarding causal spatio-temporal reasoning and inspires research in improving trustworthy spatio-temporal understanding in future Video-LLMs.
 """
 CITATION_BUTTON_LABEL = "Copy the following snippet to cite these results"
+CITATION_BUTTON_TEXT = r"""to be updated"""
 QUALITY_CLAIM_TEXT = "We use all the videos on Sora website (https://openai.com/sora) for a preliminary evaluation, including the failure case videos Sora provided."