Spaces:
Paused
Paused
x-lai
commited on
Commit
·
db69ef7
1
Parent(s):
b4fb030
Update README.md
Browse filesFormer-commit-id: 7e3f1ab8e60e3f95fa07f863021858ea55c727c1
README.md
CHANGED
@@ -12,14 +12,6 @@ This is the official implementation of ***LISA(large Language Instructed Segment
|
|
12 |
- [ ] ReasonSeg Dataset Release
|
13 |
- [ ] Codes and models Release
|
14 |
|
15 |
-
**LISA** unlocks the new segmentation capabilities of multi-modal LLMs, and can handle cases involving:
|
16 |
-
1. complex reasoning;
|
17 |
-
2. world knowledge;
|
18 |
-
3. explanatory answers;
|
19 |
-
4. multi-turn conversation.
|
20 |
-
|
21 |
-
**LISA** also demonstrates robust zero-shot capability when trained exclusively on reasoning-free datasets. In addition, fine-tuning the model with merely 239 reasoning segmentation image-instruction pairs results in further performance enhancement.
|
22 |
-
|
23 |
## Abstract
|
24 |
In this work, we propose a new segmentation task --- ***reasoning segmentation***. The task is designed to output a segmentation mask given a complex and implicit query text. We establish a benchmark comprising over one thousand image-instruction pairs, incorporating intricate reasoning and world knowledge for evaluation purposes. Finally, we present LISA: Large-language Instructed Segmentation Assistant, which inherits the language generation capabilities of the multi-modal Large Language Model (LLM) while also possessing the ability to produce segmentation masks.
|
25 |
For more details, please refer to:
|
@@ -35,6 +27,15 @@ For more details, please refer to:
|
|
35 |
|
36 |
<p align="center"> <img src="imgs/fig_overview_v6_crop.png" width="100%"> </p>
|
37 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
38 |
## Experimental results
|
39 |
<p align="center"> <img src="imgs/Table1.png" width="80%"> </p>
|
40 |
|
|
|
12 |
- [ ] ReasonSeg Dataset Release
|
13 |
- [ ] Codes and models Release
|
14 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
15 |
## Abstract
|
16 |
In this work, we propose a new segmentation task --- ***reasoning segmentation***. The task is designed to output a segmentation mask given a complex and implicit query text. We establish a benchmark comprising over one thousand image-instruction pairs, incorporating intricate reasoning and world knowledge for evaluation purposes. Finally, we present LISA: Large-language Instructed Segmentation Assistant, which inherits the language generation capabilities of the multi-modal Large Language Model (LLM) while also possessing the ability to produce segmentation masks.
|
17 |
For more details, please refer to:
|
|
|
27 |
|
28 |
<p align="center"> <img src="imgs/fig_overview_v6_crop.png" width="100%"> </p>
|
29 |
|
30 |
+
## Highlights
|
31 |
+
**LISA** unlocks the new segmentation capabilities of multi-modal LLMs, and can handle cases involving:
|
32 |
+
1. complex reasoning;
|
33 |
+
2. world knowledge;
|
34 |
+
3. explanatory answers;
|
35 |
+
4. multi-turn conversation.
|
36 |
+
|
37 |
+
**LISA** also demonstrates robust zero-shot capability when trained exclusively on reasoning-free datasets. In addition, fine-tuning the model with merely 239 reasoning segmentation image-instruction pairs results in further performance enhancement.
|
38 |
+
|
39 |
## Experimental results
|
40 |
<p align="center"> <img src="imgs/Table1.png" width="80%"> </p>
|
41 |
|