lisa-on-cuda / README.md
eltociear's picture
Update README.md
a20a0a8
|
raw
history blame
2.87 kB

LISA: Reasoning Segmentation Via Large Language Model

This is the official implementation of LISA(large Language Instructed Segmentation Assistant).

News

  • [2023.8.2] Paper is released and GitHub repo is created.

TODO

  • Hugging Face Demo
  • ReasonSeg Dataset Release
  • Codes and models Release

Abstract

In this work, we propose a new segmentation task --- reasoning segmentation. The task is designed to output a segmentation mask given a complex and implicit query text. We establish a benchmark comprising over one thousand image-instruction pairs, incorporating intricate reasoning and world knowledge for evaluation purposes. Finally, we present LISA: Large-language Instructed Segmentation Assistant, which inherits the language generation capabilities of the multi-modal Large Language Model (LLM) while also possessing the ability to produce segmentation masks. For more details, please refer to:

LISA: Reasoning Segmentation Via Large Language Model [Paper]
Xin Lai, Zhuotao Tian, Yukang Chen, Yanwei Li, Yuhui Yuan, Shu Liu, Jiaya Jia

Highlights

LISA unlocks the new segmentation capabilities of multi-modal LLMs, and can handle cases involving:

  1. complex reasoning;
  2. world knowledge;
  3. explanatory answers;
  4. multi-turn conversation.

LISA also demonstrates robust zero-shot capability when trained exclusively on reasoning-free datasets. In addition, fine-tuning the model with merely 239 reasoning segmentation image-instruction pairs results in further performance enhancement.

Experimental results

Citation

If you find this project useful in your research, please consider citing:

@article{reason-seg,
  title={LISA: Reasoning Segmentation Via Large Language Model},
  author={Xin Lai and Zhuotao Tian and Yukang Chen and Yanwei Li and Yuhui Yuan and Shu Liu and Jiaya Jia},
  journal={arXiv:2308.00692},
  year={2023}
}

Acknowledgement