File size: 7,668 Bytes
c77aba7 5f8531e c77aba7 5f8531e |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 |
---
license: apache-2.0
pipeline_tag: mask-generation
---
# NanoSAM: Accelerated Segment Anything Model for Edge deployment
- [GitHub](https://github.com/binh234/nanosam)
- [Demo](https://huggingface.co/spaces/dragonSwing/nanosam)
## Pretrained Models
NanoSAM performance on edge devices. Latency/throughput is measured on NVIDIA Jetson Xavier NX, and NVIDIA T4 GPU with TensorRT, fp16. Data transfer time is included.
<table style="border-top: solid 1px; border-left: solid 1px; border-right: solid 1px; border-bottom: solid 1px">
<thead>
<tr>
<th rowspan=2 style="text-align: center; border-right: solid 1px">Model †</th>
<th colspan=2 style="text-align: center; border-right: solid 1px">:stopwatch: CPU (ms)</th>
<th colspan=2 style="text-align: center; border-right: solid 1px">:stopwatch: Jetson Xavier NX (ms)</th>
<th colspan=2 style="text-align: center; border-right: solid 1px">:stopwatch: T4 (ms)</th>
<th rowspan=2 style="text-align: center; border-right: solid 1px">Model Size</th>
<th rowspan=2 style="text-align: center; border-right: solid 1px">Link</th>
</tr>
<tr>
<th style="text-align: center; border-right: solid 1px">Image Encoder</th>
<th style="text-align: center; border-right: solid 1px">Full Pipeline</th>
<th style="text-align: center; border-right: solid 1px">Image Encoder</th>
<th style="text-align: center; border-right: solid 1px">Full Pipeline</th>
<th style="text-align: center; border-right: solid 1px">Image Encoder</th>
<th style="text-align: center; border-right: solid 1px">Full Pipeline</th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align: center; border-right: solid 1px">PPHGV2-SAM-B1</td>
<td style="text-align: center; border-right: solid 1px">110ms</td>
<td style="text-align: center; border-right: solid 1px">180ms</td>
<td style="text-align: center; border-right: solid 1px">9.6ms</td>
<td style="text-align: center; border-right: solid 1px">17ms</td>
<td style="text-align: center; border-right: solid 1px">2.4ms</td>
<td style="text-align: center; border-right: solid 1px">5.8ms</td>
<td style="text-align: center; border-right: solid 1px">12.1MB</td>
<td style="text-align: center; border-right: solid 1px"><a href="https://huggingface.co/dragonSwing/nanosam/resolve/main/sam_hgv2_b1_ln_nonorm_image_encoder.onnx">Link</a></td>
</tr>
<tr>
<td style="text-align: center; border-right: solid 1px">PPHGV2-SAM-B2</td>
<td style="text-align: center; border-right: solid 1px">200ms</td>
<td style="text-align: center; border-right: solid 1px">270ms</td>
<td style="text-align: center; border-right: solid 1px">12.4ms</td>
<td style="text-align: center; border-right: solid 1px">19.8ms</td>
<td style="text-align: center; border-right: solid 1px">3.2ms</td>
<td style="text-align: center; border-right: solid 1px">6.4ms</td>
<td style="text-align: center; border-right: solid 1px">28.1MB</td>
<td style="text-align: center; border-right: solid 1px"><a href="https://huggingface.co/dragonSwing/nanosam/resolve/main/sam_hgv2_b4_ln_nonorm_image_encoder.onnx">Link</a></td>
</tr>
<tr>
<td style="text-align: center; border-right: solid 1px">PPHGV2-SAM-B4</td>
<td style="text-align: center; border-right: solid 1px">300ms</td>
<td style="text-align: center; border-right: solid 1px">370ms</td>
<td style="text-align: center; border-right: solid 1px">17.3ms</td>
<td style="text-align: center; border-right: solid 1px">24.7ms</td>
<td style="text-align: center; border-right: solid 1px">4.1ms</td>
<td style="text-align: center; border-right: solid 1px">7.5ms</td>
<td style="text-align: center; border-right: solid 1px">58.6MB</td>
<td style="text-align: center; border-right: solid 1px"><a href="https://huggingface.co/dragonSwing/nanosam/resolve/main/sam_hgv2_b4_ln_nonorm_image_encoder.onnx">Link</a></td>
</tr>
<tr>
<td style="text-align: center; border-right: solid 1px">NanoSAM (ResNet18)</td>
<td style="text-align: center; border-right: solid 1px">500ms</td>
<td style="text-align: center; border-right: solid 1px">570ms</td>
<td style="text-align: center; border-right: solid 1px">22.4ms</td>
<td style="text-align: center; border-right: solid 1px">29.8ms</td>
<td style="text-align: center; border-right: solid 1px">5.8ms</td>
<td style="text-align: center; border-right: solid 1px">9.2ms</td>
<td style="text-align: center; border-right: solid 1px">60.4MB</td>
<td style="text-align: center; border-right: solid 1px"><a href="https://drive.google.com/file/d/14-SsvoaTl-esC3JOzomHDnI9OGgdO2OR/view?usp=drive_link">Link</a></td>
</tr>
<tr>
<td style="text-align: center; border-right: solid 1px">EfficientViT-SAM-L0</td>
<td style="text-align: center; border-right: solid 1px">1s</td>
<td style="text-align: center; border-right: solid 1px">1.07s</td>
<td style="text-align: center; border-right: solid 1px">31.6ms</td>
<td style="text-align: center; border-right: solid 1px">38ms</td>
<td style="text-align: center; border-right: solid 1px">6ms</td>
<td style="text-align: center; border-right: solid 1px">9.4ms</td>
<td style="text-align: center; border-right: solid 1px">117.4MB</td>
<td style="text-align: center; border-right: solid 1px"></td>
</tr>
</tbody>
</table>
Zero-Shot Instance Segmentation on COCO2017 validation dataset
| Image Encoder | mAP<sup>mask<br>50-95 | mIoU (all) | mIoU (large) | mIoU (medium) | mIoU (small) |
| --------------- | :-------------------: | :--------: | :----------: | :-----------: | :----------: |
| ResNet18 | - | 70.6 | 79.6 | 73.8 | 62.4 |
| MobileSAM | - | 72.8 | 80.4 | 75.9 | 65.8 |
| PPHGV2-B1 | 41.2 | 75.6 | 81.2 | 77.4 | 70.8 |
| PPHGV2-B2 | 42.6 | 76.5 | 82.2 | 78.5 | 71.5 |
| PPHGV2-B4 | 44.0 | 77.3 | 83.0 | 79.7 | 72.1 |
| EfficientViT-L0 | 45.6 | 78.6 | 83.7 | 81.0 | 73.3 |
## Usage
```python3
from nanosam.utils.predictor import Predictor
image_encoder_cfg = {
"path": "data/sam_hgv2_b4_ln_nonorm_image_encoder.onnx",
"name": "OnnxModel",
"provider": "cpu",
"normalize_input": False,
}
mask_decoder_cfg = {
"path": "data/efficientvit_l0_mask_decoder.onnx",
"name": "OnnxModel",
"provider": "cpu",
}
predictor = Predictor(encoder_cfg, decoder_cfg)
image = PIL.Image.open("assets/dogs.jpg")
predictor.set_image(image)
mask, _, _ = predictor.predict(np.array([[x, y]]), np.array([1]))
```
The point labels may be
| Point Label | Description |
| :---------: | ------------------------- |
| 0 | Background point |
| 1 | Foreground point |
| 2 | Bounding box top-left |
| 3 | Bounding box bottom-right |
|