Spaces:

V-STaR-Bench
/

V-STaR-LeaderBoard

Running

V-STaR commited on Mar 16

Commit

0df35f9

verified ·

1 Parent(s): ff9e632

Update constants.py

Files changed (1) hide show

constants.py CHANGED Viewed

@@ -189,6 +189,7 @@ LEADERBORAD_INTRODUCTION = """# V-STaR Leaderboard
     - **Comprehensive Dimensions:** We evaluate Video-LLM’s spatio-temporal reasoning ability in answering questions explicitly in the context of “when”, “where”, and “what”.
     - **Human Alignment:** We conducted extensive experiments and human annotations to validate robustness of V-STaR.
     - **Valuable Insights:** V-STaR reveals a fundamental weakness in existing Video-LLMs regarding causal spatio-temporal reasoning.
     **Join Leaderboard**: Please contact us to update your results.

     - **Comprehensive Dimensions:** We evaluate Video-LLM’s spatio-temporal reasoning ability in answering questions explicitly in the context of “when”, “where”, and “what”.
     - **Human Alignment:** We conducted extensive experiments and human annotations to validate robustness of V-STaR.
+    - **New Metrics:** We proposed to use Arithmetic Mean (AM) and modified logarithmic Geometric Mean (LGM) to measure the spatio-temporal reasoning capability of Video-LLMs. We calculate AM and LGM from the "Accuracy" of VQA, "m_tIoU" of Temporal grounding and "m_vIoU" of Spatial Grounding, and we get the mean AM (mAM) and mean LGM (mLGM) from the results of our proposed 2 RSTR question chains.
     - **Valuable Insights:** V-STaR reveals a fundamental weakness in existing Video-LLMs regarding causal spatio-temporal reasoning.
     **Join Leaderboard**: Please contact us to update your results.