Spaces:
Running
Running
lixuejing
commited on
Commit
·
38f42b3
1
Parent(s):
79ed997
update
Browse files- app.py +1 -0
- src/about.py +28 -0
app.py
CHANGED
@@ -11,6 +11,7 @@ from src.about import (
|
|
11 |
INTRODUCTION_TEXT,
|
12 |
LLM_BENCHMARKS_TEXT,
|
13 |
TITLE,
|
|
|
14 |
)
|
15 |
from src.display.css_html_js import custom_css
|
16 |
from src.display.utils import (
|
|
|
11 |
INTRODUCTION_TEXT,
|
12 |
LLM_BENCHMARKS_TEXT,
|
13 |
TITLE,
|
14 |
+
EVALUATION_METRIC_TEXT,
|
15 |
)
|
16 |
from src.display.css_html_js import custom_css
|
17 |
from src.display.utils import (
|
src/about.py
CHANGED
@@ -96,6 +96,34 @@ FlagEvalMM is an open-source evaluation framework designed to comprehensively a
|
|
96 |
- Extensible Design: Easily extendable to incorporate new models, benchmarks, and evaluation metrics.
|
97 |
|
98 |
# Embodied verse
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
99 |
|
100 |
## Details and logs
|
101 |
You can find:
|
|
|
96 |
- Extensible Design: Easily extendable to incorporate new models, benchmarks, and evaluation metrics.
|
97 |
|
98 |
# Embodied verse
|
99 |
+
EmbodiedVerse-Open是一个由10个数据集构成的用于全面评测模型在具身智能场景下的meta dataset,包括:
|
100 |
+
|
101 |
+
<a href="https://arxiv.org/pdf/2406.10721" target="_blank"> Where2Place </a>: 包含100张来自不同杂乱环境的真实世界图像,每张图像都标注了一句描述所需自由空间位置的语句和一个对应的掩码,用于评估基于空间关系的自由空间指代表达。
|
102 |
+
<a href="https://zeyofu.github.io/blink/" target="_blank"> Blink </a>: 包含一些可以被人类轻松解决的视觉问题,EmbodiedVerse采样了和空间理解相关的类别(Counting, Relative_Depth,Spatial_Relation, Multi-view_Reasoning,Visual_Correspondence)
|
103 |
+
<a href="https://huggingface.co/datasets/nyu-visionx/CV-Bench" target="_blank"> CVBench </a>: 一个以视觉为中心的数据集,包含2638个人工筛选的问题。
|
104 |
+
<a href="https://arxiv.org/abs/2411.16537" target="_blank"> RoboSpatial-Home </a>: 一个旨在评估视觉语言模型(VLMs)在真实室内机器人环境中空间推理能力的新基准。
|
105 |
+
<a href="https://huggingface.co/datasets/Phineas476/EmbSpatial-Bench" target="_blank"> EmbspatialBench </a>: 一个用于评估具身视觉语言模型(LVLM)具身空间理解能力的基准。该基准自动从具身场景中提取,并从自我中心视角涵盖 6 种空间关系。
|
106 |
+
<a href="https://danielchyeh.github.io/All-Angles-Bench/" target="_blank"> All-Angles Bench </a>: 一个多视图理解基准,包含 90 个真实场景中超过 2100 个人工标注的多视图问答对。.
|
107 |
+
<a href="https://huggingface.co/datasets/nyu-visionx/VSI-Bench" target="_blank"> VSI-Bench </a>: 一个基于视频的基准数据集,从真实室内场景的自我中心视角视频中构造问题,旨在评估多模态大模型的视觉空间智能。EmbodiedVerse使用了包含400问题的tiny子集。
|
108 |
+
<a href="https://arxiv.org/pdf/2412.07755" target="_blank"> SAT </a>: 一个具有挑战性的真实图像动态空间测试集。
|
109 |
+
<a href="https://arxiv.org/pdf/2412.04447" target="_blank"> EgoPlan-Bench2 </a>: 一个涵盖 4 大领域和 24 个详细场景的日常任务基准,与人类日常生活紧密契合。
|
110 |
+
<a href="https://github.com/embodiedreasoning/ERQA" target="_blank"> ERQA </a>: 这个评估基准涵盖了与空间推理和世界知识相关的各种主题,侧重于现实世界的场景,尤其是在机器人技术背景下。
|
111 |
+
|
112 |
+
EmbodiedVerse-Open is a meta-dataset composed of 10 datasets for comprehensively evaluating models in embodied intelligence scenarios, including:
|
113 |
+
|
114 |
+
<a href="https://arxiv.org/pdf/2406.10721" target="_blank"> Where2Place </a>: The dataset is a collection of 100 real-world images from diverse cluttered environments, each annotated with a sentence describing a desired free space and a corresponding mask, designed to evaluate free space referencing using spatial relations.
|
115 |
+
<a href="https://zeyofu.github.io/blink/" target="_blank"> Blink </a>: Including some visual problems that can be easily solved by humans, EmbodiedVerse samples categories related to spatial understanding (Counting, Relative_Depth, Spatial_Relation, Multi-view_Reasoning, Visual_Correspondence).
|
116 |
+
<a href="https://huggingface.co/datasets/nyu-visionx/CV-Bench" target="_blank"> CVBench </a>: A vision-centric benchmarks, containing 2638 manually-inspected examples.
|
117 |
+
<a href="https://arxiv.org/abs/2411.16537" target="_blank"> RoboSpatial-Home </a>: A new spatial reasoning benchmark designed to evaluate vision-language models (VLMs) in real-world indoor environments for robotics.
|
118 |
+
<a href="https://huggingface.co/datasets/Phineas476/EmbSpatial-Bench" target="_blank"> EmbspatialBench </a>: A benchmark for evaluating embodied spatial understanding of LVLM. The benchmark is automatically derived from embodied scenes and covers 6 spatial relationships from an egocentric perspective.
|
119 |
+
<a href="https://danielchyeh.github.io/All-Angles-Bench/" target="_blank"> All-Angles Bench </a>: A Benchmark for Multi-View Understanding, including over 2,100 human-annotated multi-view QA pairs across 90 real-world scenes.
|
120 |
+
<a href="https://huggingface.co/datasets/nyu-visionx/VSI-Bench" target="_blank"> VSI-Bench </a>: A video-based benchmark dataset constructs questions from egocentric-view videos of real indoor scenes, aiming to evaluate the visual-spatial intelligence of multimodal large models. EmbodiedVerse uses a tiny subset containing 400 questions.
|
121 |
+
<a href="https://arxiv.org/pdf/2412.07755" target="_blank"> SAT </a>: A challenging real-image dynamic spatial benchmark.
|
122 |
+
<a href="https://arxiv.org/pdf/2412.04447" target="_blank"> EgoPlan-Bench2 </a>: A benchmark which encompasses everyday tasks spanning4 major domains and 24 detailed scenarios, closely aligned with human daily life.
|
123 |
+
<a href="https://github.com/embodiedreasoning/ERQA" target="_blank"> ERQA </a>: This evaluation benchmark covers a variety of topics related to spatial reasoning and world knowledge focused on real-world scenarios, particularly in the context of robotics.
|
124 |
+
|
125 |
+
数据集子集链接 :comming soon
|
126 |
+
|
127 |
|
128 |
## Details and logs
|
129 |
You can find:
|