Spaces:

aletrn
/

lisa-on-cuda

Paused

x-lai commited on Aug 2, 2023

Commit

db69ef7

1 Parent(s): b4fb030

Update README.md

Former-commit-id: 7e3f1ab8e60e3f95fa07f863021858ea55c727c1

Files changed (1) hide show

README.md CHANGED Viewed

@@ -12,14 +12,6 @@ This is the official implementation of ***LISA(large Language Instructed Segment
 - [ ] ReasonSeg Dataset Release
 - [ ] Codes and models Release
-**LISA** unlocks the new segmentation capabilities of multi-modal LLMs, and can handle cases involving:
-1. complex reasoning;
-2. world knowledge;
-3. explanatory answers;
-4. multi-turn conversation.
-**LISA** also demonstrates robust zero-shot capability when trained exclusively on reasoning-free datasets. In addition, fine-tuning the model with merely 239 reasoning segmentation image-instruction pairs results in further performance enhancement.
 ## Abstract
 In this work, we propose a new segmentation task --- ***reasoning segmentation***. The task is designed to output a segmentation mask given a complex and implicit query text. We establish a benchmark comprising over one thousand image-instruction pairs, incorporating intricate reasoning and world knowledge for evaluation purposes. Finally, we present LISA: Large-language Instructed Segmentation Assistant, which inherits the language generation capabilities of the multi-modal Large Language Model (LLM) while also possessing the ability to produce segmentation masks.
 For more details, please refer to:
@@ -35,6 +27,15 @@ For more details, please refer to:
 <p align="center"> <img src="imgs/fig_overview_v6_crop.png" width="100%"> </p>
 ## Experimental results
 <p align="center"> <img src="imgs/Table1.png" width="80%"> </p>

 - [ ] ReasonSeg Dataset Release
 - [ ] Codes and models Release
 ## Abstract
 In this work, we propose a new segmentation task --- ***reasoning segmentation***. The task is designed to output a segmentation mask given a complex and implicit query text. We establish a benchmark comprising over one thousand image-instruction pairs, incorporating intricate reasoning and world knowledge for evaluation purposes. Finally, we present LISA: Large-language Instructed Segmentation Assistant, which inherits the language generation capabilities of the multi-modal Large Language Model (LLM) while also possessing the ability to produce segmentation masks.
 For more details, please refer to:
 <p align="center"> <img src="imgs/fig_overview_v6_crop.png" width="100%"> </p>
+## Highlights
+**LISA** unlocks the new segmentation capabilities of multi-modal LLMs, and can handle cases involving:
+1. complex reasoning;
+2. world knowledge;
+3. explanatory answers;
+4. multi-turn conversation.
+**LISA** also demonstrates robust zero-shot capability when trained exclusively on reasoning-free datasets. In addition, fine-tuning the model with merely 239 reasoning segmentation image-instruction pairs results in further performance enhancement.
 ## Experimental results
 <p align="center"> <img src="imgs/Table1.png" width="80%"> </p>