Faster Segement Anything (MobileSAM)

MobileSAM performs on par with the original SAM (at least visually) and keeps exactly the same pipeline as the original SAM except for a change on the image encoder. Specifically, we replace the original heavyweight ViT-H encoder (632M) with a much smaller Tiny-ViT (5M). On a single GPU, MobileSAM runs around 12ms per image: 8ms on the image encoder and 4ms on the mask decoder.

The comparison of ViT-based image encoder is summarzed as follows:

Image Encoder Original SAM MobileSAM
Paramters 611M 5M
Speed 452ms 8ms

Original SAM and MobileSAM have exactly the same prompt-guided mask decoder:

Mask Decoder Original SAM MobileSAM
Paramters 3.876M 3.876M
Speed 4ms 4ms

The comparison of the whole pipeline is summarzed as follows:

Whole Pipeline (Enc+Dec) Original SAM MobileSAM
Paramters 615M 9.66M
Speed 456ms 12ms

Acknowledgement

SAM (Segment Anything) [bib]
@article{kirillov2023segany,
  title={Segment Anything}, 
  author={Kirillov, Alexander and Mintun, Eric and Ravi, Nikhila and Mao, Hanzi and Rolland, Chloe and Gustafson, Laura and Xiao, Tete and Whitehead, Spencer and Berg, Alexander C. and Lo, Wan-Yen and Doll{\'a}r, Piotr and Girshick, Ross},
  journal={arXiv:2304.02643},
  year={2023}
}
TinyViT (TinyViT: Fast Pretraining Distillation for Small Vision Transformers) [bib]
@InProceedings{tiny_vit,
  title={TinyViT: Fast Pretraining Distillation for Small Vision Transformers},
  author={Wu, Kan and Zhang, Jinnian and Peng, Houwen and Liu, Mengchen and Xiao, Bin and Fu, Jianlong and Yuan, Lu},
  booktitle={European conference on computer vision (ECCV)},
  year={2022}

BibTeX:

@article{mobile_sam,
  title={Faster Segment Anything: Towards Lightweight SAM for Mobile Applications},
  author={Zhang, Chaoning and Han, Dongshen and Qiao, Yu and Kim, Jung Uk and Bae, Sung Ho and Lee, Seungkyu and Hong, Choong Seon},
  journal={arXiv preprint arXiv:2306.14289},
  year={2023}
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no library tag.

Model tree for dhkim2810/MobileSAM

Quantizations
1 model

Spaces using dhkim2810/MobileSAM 4