Image-Text-to-Text
Safetensors
openvla
custom_code
soujanyaporia commited on
Commit
200d588
·
verified ·
1 Parent(s): 14bfb19

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +41 -12
README.md CHANGED
@@ -1,15 +1,34 @@
1
- ---
2
- license: apache-2.0
3
- datasets:
4
- - declare-lab/Emma-X-GCOT
5
- metrics:
6
- - accuracy
7
- base_model:
8
- - openvla/openvla-7b
9
- ---
10
-
11
- # Emma-X (7B)
12
- Model Overview
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
13
 
14
  EMMA-X is an Embodied Multimodal Action (VLA) Model designed to bridge the gap between Visual-Language Models (VLMs) and robotic control tasks. EMMA-X generalizes effectively across diverse environments, objects, and instructions while excelling at long-horizon spatial reasoning and grounded task planning using a novel Trajectory Segmentation Strategy.
15
 
@@ -68,4 +87,14 @@ action, grounded_reasoning = vla.generate_actions(
68
  print("Grounded Reasoning:", grounded_reasoning)
69
  # Execute...
70
  robot.act(action, ...)
 
 
 
 
 
 
 
 
 
 
71
  ```
 
1
+ ---
2
+ license: apache-2.0
3
+ datasets:
4
+ - declare-lab/Emma-X-GCOT
5
+ metrics:
6
+ - accuracy
7
+ base_model:
8
+ - openvla/openvla-7b
9
+ pipeline_tag: image-text-to-text
10
+ ---
11
+
12
+ <h1 align="center">✨
13
+ <br/>
14
+ Meet Emma-X, an Embodied Multimodal Action Model
15
+ <br/>
16
+ ✨✨✨
17
+
18
+
19
+ </h1>
20
+
21
+ <div align="center">
22
+ <img src="https://raw.githubusercontent.com/declare-lab/Emma-X/main/Emma-X.png" alt="Emma-X" width="300" />
23
+
24
+ <br/>
25
+
26
+ [![arXiv](https://img.shields.io/badge/arxiv-2412.11974-b31b1b)](https://arxiv.org/abs/2412.11974) [![Emma-X](https://img.shields.io/badge/Huggingface-Emma--X-brightgreen?style=flat&logo=huggingface&color=violet)](https://huggingface.co/declare-lab/Emma-X) [![Static Badge](https://img.shields.io/badge/Demos-declare--lab-brightred?style=flat)](https://declare-lab.github.io/Emma-X/)
27
+
28
+
29
+ </div>
30
+
31
+ ## Model Overview
32
 
33
  EMMA-X is an Embodied Multimodal Action (VLA) Model designed to bridge the gap between Visual-Language Models (VLMs) and robotic control tasks. EMMA-X generalizes effectively across diverse environments, objects, and instructions while excelling at long-horizon spatial reasoning and grounded task planning using a novel Trajectory Segmentation Strategy.
34
 
 
87
  print("Grounded Reasoning:", grounded_reasoning)
88
  # Execute...
89
  robot.act(action, ...)
90
+ ```
91
+
92
+ ## Citation
93
+ ```
94
+ @article{sun2024emma,
95
+ title={Emma-X: An Embodied Multimodal Action Model with Grounded Chain of Thought and Look-ahead Spatial Reasoning},
96
+ author={Sun, Qi and Hong, Pengfei and Pala, Tej Deep and Toh, Vernon and Tan, U-Xuan and Ghosal, Deepanway and Poria, Soujanya},
97
+ journal={arXiv preprint arXiv:2412.11974},
98
+ year={2024}
99
+ }
100
  ```