declare-lab
/

Emma-X

Image-Text-to-Text

Safetensors

openvla

custom_code

Model card Files Files and versions Community

soujanyaporia commited on Dec 21, 2024

Commit

3f185b0

verified ·

1 Parent(s): 200d588

Update README.md

Browse files

Files changed (1) hide show

README.md +20 -21

README.md CHANGED Viewed

@@ -1,13 +1,13 @@
----
-license: apache-2.0
-datasets:
-- declare-lab/Emma-X-GCOT
-metrics:
-- accuracy
-base_model:
-- openvla/openvla-7b
-pipeline_tag: image-text-to-text
----
 <h1 align="center">✨
 <br/>
@@ -30,20 +30,19 @@ Meet Emma-X, an Embodied Multimodal Action Model
 ## Model Overview
-EMMA-X is an Embodied Multimodal Action (VLA) Model designed to bridge the gap between Visual-Language Models (VLMs) and robotic control tasks. EMMA-X generalizes effectively across diverse environments, objects, and instructions while excelling at long-horizon spatial reasoning and grounded task planning using a novel Trajectory Segmentation Strategy.
-•	Hierarchical Embodiment Dataset:
-EMMA-X is trained on a dataset derived from BridgeV2, containing 60,000 robot manipulation trajectories. Trained using a hierarchical dataset with visual grounded chain-of-thought reasoning, EMMA-X's output will include the following components:
-Grounded Chain-of-Thought Reasoning:
-Helps break down tasks into smaller, manageable subtasks, ensuring accurate task execution by mitigating hallucination in reasoning.
-Gripper Position Guidance: Affordance point inside the image.
-Look-Ahead Spatial Reasoning:
-Enables the model to plan actions while considering spatial guidance for effective planning, enhancing long-horizon task performance.
-Action: Action policy in 7-dimensional vector to control the robot ([WidowX-6Dof](https://www.trossenrobotics.com/widowx-250)).
 ## Model Card
 - **Developed by:** SUTD Declare Lab
@@ -53,8 +52,8 @@ Action: Action policy in 7-dimensional vector to control the robot ([WidowX-6Dof
 - **Finetuned from:** [`openvla-7B`](https://huggingface.co/openvla/openvla-7b/)
 - **Pretraining Dataset:** Augmented version of [Bridge V2](https://rail-berkeley.github.io/bridgedata/), for more info check our repository.
 - **Repository:** [https://github.com/declare-lab/Emma-X/](https://github.com/declare-lab/Emma-X/)
-- **Paper:** [Emma-X: An Embodied Multimodal Action Model with Grounded Chain of Thought and Look-ahead Spatial Reasoning]()
-- **Project Page & Videos:** []()
 ## Getting Started
 ```python

+---
+license: apache-2.0
+datasets:
+- declare-lab/Emma-X-GCOT
+metrics:
+- accuracy
+base_model:
+- openvla/openvla-7b
+pipeline_tag: image-text-to-text
+---
 <h1 align="center">✨
 <br/>
 ## Model Overview
+EMMA-X is an Embodied Multimodal Action (VLA) Model designed to bridge the gap between Visual-Language Models (VLMs) and robotic control tasks. EMMA-X generalizes effectively across diverse environments, objects, and instructions while excelling at long-horizon spatial reasoning and grounded task planning using a novel Trajectory Segmentation Strategy. It relies on --
+- Hierarchical Embodiment Dataset: Emma-X is trained on a dataset derived from BridgeV2, containing 60,000 robot manipulation trajectories. Trained using a hierarchical dataset with visual grounded chain-of-thought reasoning, EMMA-X's output will include the following components:
+- Grounded Chain-of-Thought Reasoning: Helps break down tasks into smaller, manageable subtasks, ensuring accurate task execution by mitigating hallucination in reasoning.
+- Gripper Position Guidance: Affordance point inside the image.
+- Look-Ahead Spatial Reasoning: Enables the model to plan actions while considering spatial guidance for effective planning, enhancing long-horizon task performance.
+It generates:
+- Action: Action policy in 7-dimensional vector to control the robot ([WidowX-6Dof](https://www.trossenrobotics.com/widowx-250)).
 ## Model Card
 - **Developed by:** SUTD Declare Lab
 - **Finetuned from:** [`openvla-7B`](https://huggingface.co/openvla/openvla-7b/)
 - **Pretraining Dataset:** Augmented version of [Bridge V2](https://rail-berkeley.github.io/bridgedata/), for more info check our repository.
 - **Repository:** [https://github.com/declare-lab/Emma-X/](https://github.com/declare-lab/Emma-X/)
+- **Paper:** [Emma-X: An Embodied Multimodal Action Model with Grounded Chain of Thought and Look-ahead Spatial Reasoning](https://arxiv.org/pdf/2412.11974)
+- **Project Page & Videos:** [https://declare-lab.github.io/Emma-X/](https://declare-lab.github.io/Emma-X/)
 ## Getting Started
 ```python