x-lai commited on
Commit
db69ef7
·
1 Parent(s): b4fb030

Update README.md

Browse files

Former-commit-id: 7e3f1ab8e60e3f95fa07f863021858ea55c727c1

Files changed (1) hide show
  1. README.md +9 -8
README.md CHANGED
@@ -12,14 +12,6 @@ This is the official implementation of ***LISA(large Language Instructed Segment
12
  - [ ] ReasonSeg Dataset Release
13
  - [ ] Codes and models Release
14
 
15
- **LISA** unlocks the new segmentation capabilities of multi-modal LLMs, and can handle cases involving:
16
- 1. complex reasoning;
17
- 2. world knowledge;
18
- 3. explanatory answers;
19
- 4. multi-turn conversation.
20
-
21
- **LISA** also demonstrates robust zero-shot capability when trained exclusively on reasoning-free datasets. In addition, fine-tuning the model with merely 239 reasoning segmentation image-instruction pairs results in further performance enhancement.
22
-
23
  ## Abstract
24
  In this work, we propose a new segmentation task --- ***reasoning segmentation***. The task is designed to output a segmentation mask given a complex and implicit query text. We establish a benchmark comprising over one thousand image-instruction pairs, incorporating intricate reasoning and world knowledge for evaluation purposes. Finally, we present LISA: Large-language Instructed Segmentation Assistant, which inherits the language generation capabilities of the multi-modal Large Language Model (LLM) while also possessing the ability to produce segmentation masks.
25
  For more details, please refer to:
@@ -35,6 +27,15 @@ For more details, please refer to:
35
 
36
  <p align="center"> <img src="imgs/fig_overview_v6_crop.png" width="100%"> </p>
37
 
 
 
 
 
 
 
 
 
 
38
  ## Experimental results
39
  <p align="center"> <img src="imgs/Table1.png" width="80%"> </p>
40
 
 
12
  - [ ] ReasonSeg Dataset Release
13
  - [ ] Codes and models Release
14
 
 
 
 
 
 
 
 
 
15
  ## Abstract
16
  In this work, we propose a new segmentation task --- ***reasoning segmentation***. The task is designed to output a segmentation mask given a complex and implicit query text. We establish a benchmark comprising over one thousand image-instruction pairs, incorporating intricate reasoning and world knowledge for evaluation purposes. Finally, we present LISA: Large-language Instructed Segmentation Assistant, which inherits the language generation capabilities of the multi-modal Large Language Model (LLM) while also possessing the ability to produce segmentation masks.
17
  For more details, please refer to:
 
27
 
28
  <p align="center"> <img src="imgs/fig_overview_v6_crop.png" width="100%"> </p>
29
 
30
+ ## Highlights
31
+ **LISA** unlocks the new segmentation capabilities of multi-modal LLMs, and can handle cases involving:
32
+ 1. complex reasoning;
33
+ 2. world knowledge;
34
+ 3. explanatory answers;
35
+ 4. multi-turn conversation.
36
+
37
+ **LISA** also demonstrates robust zero-shot capability when trained exclusively on reasoning-free datasets. In addition, fine-tuning the model with merely 239 reasoning segmentation image-instruction pairs results in further performance enhancement.
38
+
39
  ## Experimental results
40
  <p align="center"> <img src="imgs/Table1.png" width="80%"> </p>
41