declare-lab
/

Emma-X

Image-Text-to-Text

Model card Files Files and versions Community

soujanyaporia commited on Dec 21, 2024

Commit

200d588

·

verified ·

1 Parent(s): 14bfb19

Update README.md

Files changed (1) hide show

README.md +41 -12

README.md CHANGED Viewed

@@ -1,15 +1,34 @@
----
-license: apache-2.0
-datasets:
-- declare-lab/Emma-X-GCOT
-metrics:
-- accuracy
-base_model:
-- openvla/openvla-7b
----
-# Emma-X (7B)
-Model Overview
 EMMA-X is an Embodied Multimodal Action (VLA) Model designed to bridge the gap between Visual-Language Models (VLMs) and robotic control tasks. EMMA-X generalizes effectively across diverse environments, objects, and instructions while excelling at long-horizon spatial reasoning and grounded task planning using a novel Trajectory Segmentation Strategy.
@@ -68,4 +87,14 @@ action, grounded_reasoning = vla.generate_actions(
 print("Grounded Reasoning:", grounded_reasoning)
 # Execute...
 robot.act(action, ...)
 ```

+---
+license: apache-2.0
+datasets:
+- declare-lab/Emma-X-GCOT
+metrics:
+- accuracy
+base_model:
+- openvla/openvla-7b
+pipeline_tag: image-text-to-text
+---
+<h1 align="center">✨
+<br/>
+Meet Emma-X, an Embodied Multimodal Action Model
+<br/>
+✨✨✨
+</h1>
+<div align="center">
+  <img src="https://raw.githubusercontent.com/declare-lab/Emma-X/main/Emma-X.png" alt="Emma-X" width="300" />
+<br/>
+[![arXiv](https://img.shields.io/badge/arxiv-2412.11974-b31b1b)](https://arxiv.org/abs/2412.11974) [![Emma-X](https://img.shields.io/badge/Huggingface-Emma--X-brightgreen?style=flat&logo=huggingface&color=violet)](https://huggingface.co/declare-lab/Emma-X) [![Static Badge](https://img.shields.io/badge/Demos-declare--lab-brightred?style=flat)](https://declare-lab.github.io/Emma-X/)
+</div>
+## Model Overview
 EMMA-X is an Embodied Multimodal Action (VLA) Model designed to bridge the gap between Visual-Language Models (VLMs) and robotic control tasks. EMMA-X generalizes effectively across diverse environments, objects, and instructions while excelling at long-horizon spatial reasoning and grounded task planning using a novel Trajectory Segmentation Strategy.
 print("Grounded Reasoning:", grounded_reasoning)
 # Execute...
 robot.act(action, ...)
+```
+## Citation
+```
+@article{sun2024emma,
+  title={Emma-X: An Embodied Multimodal Action Model with Grounded Chain of Thought and Look-ahead Spatial Reasoning},
+  author={Sun, Qi and Hong, Pengfei and Pala, Tej Deep and Toh, Vernon and Tan, U-Xuan and Ghosal, Deepanway and Poria, Soujanya},
+  journal={arXiv preprint arXiv:2412.11974},
+  year={2024}
+}
 ```