TIGER-Lab
/

VideoScore-anno-only

@@ -11,7 +11,7 @@ pipeline_tag: visual-question-answering
 ---
-[📃Paper](https://arxiv.org/abs/2406.15252) | [🌐Website](https://tiger-ai-lab.github.io/VideoScore/) | [💻Github](https://github.com/TIGER-AI-Lab/VideoScore) | [🛢️Datasets](https://huggingface.co/datasets/TIGER-Lab/VideoFeedback) | [🤗Model (VideoScore)](https://huggingface.co/TIGER-Lab/VideoScore) | [🤗Model (VideoScore-anno-only)](https://huggingface.co/TIGER-Lab/VideoScore-anno-only) | [🤗Model (VideoScore-v1.1)](https://huggingface.co/TIGER-Lab/VideoScore-v1.1)| [🤗Demo](https://huggingface.co/spaces/TIGER-Lab/VideoScore) | [📉Wandb (VideoScore-anno-only)](https://api.wandb.ai/links/xuanhe/4vs5k0cq)
 ![VideoScore](https://tiger-ai-lab.github.io/VideoScore/static/images/teaser.png)
@@ -20,7 +20,7 @@ pipeline_tag: visual-question-answering
 - 🧐🧐[VideoScore-anno-only](https://huggingface.co/TIGER-Lab/VideoScore-anno-only) is a variant from [VideoScore](https://huggingface.co/TIGER-Lab/VideoScore), trained on VideoFeedback dataset
 excluding the real videos.
-- [VideoScore](https://huggingface.co/TIGER-Lab/VideoScore) series is a video quality evaluation model series, taking [Mantis-8B-Idefics2](https://huggingface.co/TIGER-Lab/Mantis-8B-Idefics2) as base-model
 and trained on [VideoFeedback](https://huggingface.co/datasets/TIGER-Lab/VideoFeedback),
 a large video evaluation dataset with multi-aspect human scores.
@@ -49,8 +49,8 @@ The evaluation results are shown below:
 | metric            | Final Avg Score | VideoFeedback-test | EvalCrafter | GenAI-Bench | VBench     |
 |:-----------------:|:--------------:|:--------------:|:-----------:|:-----------:|:----------:|
-| VideoScore (reg) |       **69.6** |           75.7 |    **51.1** |    **78.5** |   **73.0** |
-| VideoScore (gen) |           55.6 |       **77.1** |        27.6 |        59.0 |       58.7 |
 | Gemini-1.5-Pro    |    <u>39.7</u> |           22.1 |        22.9 |        60.9 |       52.9 |
 | Gemini-1.5-Flash  |           39.4 |           20.8 |        17.3 | <u>67.1</u> |       52.3 |
 | GPT-4o            |           38.9 |    <u>23.1</u> |        28.7 |        52.0 |       51.7 |

 ---
+[📃Paper](https://arxiv.org/abs/2406.15252) | [🌐Website](https://tiger-ai-lab.github.io/VideoScore/) | [💻Github](https://github.com/TIGER-AI-Lab/VideoScore) | [🛢️Datasets](https://huggingface.co/datasets/TIGER-Lab/VideoFeedback) | [🤗Model (VideoScore)](https://huggingface.co/TIGER-Lab/VideoScore) | [🤗Demo](https://huggingface.co/spaces/TIGER-Lab/VideoScore) | [📉Wandb (VideoScore-anno-only)](https://api.wandb.ai/links/xuanhe/4vs5k0cq)
 ![VideoScore](https://tiger-ai-lab.github.io/VideoScore/static/images/teaser.png)
 - 🧐🧐[VideoScore-anno-only](https://huggingface.co/TIGER-Lab/VideoScore-anno-only) is a variant from [VideoScore](https://huggingface.co/TIGER-Lab/VideoScore), trained on VideoFeedback dataset
 excluding the real videos.
+- [VideoScore](https://huggingface.co/TIGER-Lab/VideoScore) series is a video quality evaluation model series, taking [Mantis-8B-Idefics2](https://huggingface.co/TIGER-Lab/Mantis-8B-Idefics2) or [Qwen/Qwen2-VL](https://huggingface.co/Qwen/Qwen2-VL-7B-Instruct) as base-model,
 and trained on [VideoFeedback](https://huggingface.co/datasets/TIGER-Lab/VideoFeedback),
 a large video evaluation dataset with multi-aspect human scores.
 | metric            | Final Avg Score | VideoFeedback-test | EvalCrafter | GenAI-Bench | VBench     |
 |:-----------------:|:--------------:|:--------------:|:-----------:|:-----------:|:----------:|
+| VideoScore (reg)  |       **69.6** |           75.7 |    **51.1** |    **78.5** |   **73.0** |
+| VideoScore (gen)  |           55.6 |       **77.1** |        27.6 |        59.0 |       58.7 |
 | Gemini-1.5-Pro    |    <u>39.7</u> |           22.1 |        22.9 |        60.9 |       52.9 |
 | Gemini-1.5-Flash  |           39.4 |           20.8 |        17.3 | <u>67.1</u> |       52.3 |
 | GPT-4o            |           38.9 |    <u>23.1</u> |        28.7 |        52.0 |       51.7 |