shreyajn commited on
Commit
c27cbc6
·
verified ·
1 Parent(s): c8403d0

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +41 -267
README.md CHANGED
@@ -35,34 +35,25 @@ More details on model performance across various devices, can be found
35
 
36
  | Model | Device | Chipset | Target Runtime | Inference Time (ms) | Peak Memory Range (MB) | Precision | Primary Compute Unit | Target Model
37
  |---|---|---|---|---|---|---|---|---|
38
- | SAMDecoder | Samsung Galaxy S23 | Snapdragon® 8 Gen 2 | ONNX | 8.448 ms | 1 - 57 MB | FP16 | NPU | [Segment-Anything-Model.onnx](https://huggingface.co/qualcomm/Segment-Anything-Model/blob/main/SAMDecoder.onnx) |
39
- | SAMDecoder | Samsung Galaxy S24 | Snapdragon® 8 Gen 3 | ONNX | 5.889 ms | 6 - 68 MB | FP16 | NPU | [Segment-Anything-Model.onnx](https://huggingface.co/qualcomm/Segment-Anything-Model/blob/main/SAMDecoder.onnx) |
40
- | SAMDecoder | Snapdragon 8 Elite QRD | Snapdragon® 8 Elite | ONNX | 5.73 ms | 4 - 55 MB | FP16 | NPU | [Segment-Anything-Model.onnx](https://huggingface.co/qualcomm/Segment-Anything-Model/blob/main/SAMDecoder.onnx) |
41
- | SAMDecoder | Snapdragon X Elite CRD | Snapdragon® X Elite | ONNX | 8.67 ms | 11 - 11 MB | FP16 | NPU | [Segment-Anything-Model.onnx](https://huggingface.co/qualcomm/Segment-Anything-Model/blob/main/SAMDecoder.onnx) |
42
- | SAMEncoderPart1 | Samsung Galaxy S23 | Snapdragon® 8 Gen 2 | ONNX | 228.979 ms | 12 - 181 MB | FP16 | NPU | [Segment-Anything-Model.onnx](https://huggingface.co/qualcomm/Segment-Anything-Model/blob/main/SAMEncoderPart1.onnx) |
43
- | SAMEncoderPart1 | Samsung Galaxy S24 | Snapdragon® 8 Gen 3 | ONNX | 161.859 ms | 36 - 838 MB | FP16 | NPU | [Segment-Anything-Model.onnx](https://huggingface.co/qualcomm/Segment-Anything-Model/blob/main/SAMEncoderPart1.onnx) |
44
- | SAMEncoderPart1 | Snapdragon 8 Elite QRD | Snapdragon® 8 Elite | ONNX | 158.396 ms | 35 - 789 MB | FP16 | NPU | [Segment-Anything-Model.onnx](https://huggingface.co/qualcomm/Segment-Anything-Model/blob/main/SAMEncoderPart1.onnx) |
45
- | SAMEncoderPart1 | Snapdragon X Elite CRD | Snapdragon® X Elite | ONNX | 231.597 ms | 43 - 43 MB | FP16 | NPU | [Segment-Anything-Model.onnx](https://huggingface.co/qualcomm/Segment-Anything-Model/blob/main/SAMEncoderPart1.onnx) |
46
- | SAMEncoderPart2 | Samsung Galaxy S23 | Snapdragon® 8 Gen 2 | ONNX | 781.713 ms | 12 - 147 MB | FP16 | NPU | [Segment-Anything-Model.onnx](https://huggingface.co/qualcomm/Segment-Anything-Model/blob/main/SAMEncoderPart2.onnx) |
47
- | SAMEncoderPart2 | Samsung Galaxy S24 | Snapdragon® 8 Gen 3 | ONNX | 567.288 ms | 36 - 736 MB | FP16 | NPU | [Segment-Anything-Model.onnx](https://huggingface.co/qualcomm/Segment-Anything-Model/blob/main/SAMEncoderPart2.onnx) |
48
- | SAMEncoderPart2 | Snapdragon 8 Elite QRD | Snapdragon® 8 Elite | ONNX | 531.384 ms | 12 - 686 MB | FP16 | NPU | [Segment-Anything-Model.onnx](https://huggingface.co/qualcomm/Segment-Anything-Model/blob/main/SAMEncoderPart2.onnx) |
49
- | SAMEncoderPart2 | Snapdragon X Elite CRD | Snapdragon® X Elite | ONNX | 736.185 ms | 33 - 33 MB | FP16 | NPU | [Segment-Anything-Model.onnx](https://huggingface.co/qualcomm/Segment-Anything-Model/blob/main/SAMEncoderPart2.onnx) |
50
- | SAMEncoderPart3 | Samsung Galaxy S23 | Snapdragon® 8 Gen 2 | ONNX | 779.664 ms | 12 - 159 MB | FP16 | NPU | [Segment-Anything-Model.onnx](https://huggingface.co/qualcomm/Segment-Anything-Model/blob/main/SAMEncoderPart3.onnx) |
51
- | SAMEncoderPart3 | Samsung Galaxy S24 | Snapdragon® 8 Gen 3 | ONNX | 576.302 ms | 22 - 724 MB | FP16 | NPU | [Segment-Anything-Model.onnx](https://huggingface.co/qualcomm/Segment-Anything-Model/blob/main/SAMEncoderPart3.onnx) |
52
- | SAMEncoderPart3 | Snapdragon 8 Elite QRD | Snapdragon® 8 Elite | ONNX | 530.988 ms | 12 - 686 MB | FP16 | NPU | [Segment-Anything-Model.onnx](https://huggingface.co/qualcomm/Segment-Anything-Model/blob/main/SAMEncoderPart3.onnx) |
53
- | SAMEncoderPart3 | Snapdragon X Elite CRD | Snapdragon® X Elite | ONNX | 729.557 ms | 33 - 33 MB | FP16 | NPU | [Segment-Anything-Model.onnx](https://huggingface.co/qualcomm/Segment-Anything-Model/blob/main/SAMEncoderPart3.onnx) |
54
- | SAMEncoderPart4 | Samsung Galaxy S23 | Snapdragon® 8 Gen 2 | ONNX | 770.123 ms | 12 - 151 MB | FP16 | NPU | [Segment-Anything-Model.onnx](https://huggingface.co/qualcomm/Segment-Anything-Model/blob/main/SAMEncoderPart4.onnx) |
55
- | SAMEncoderPart4 | Samsung Galaxy S24 | Snapdragon® 8 Gen 3 | ONNX | 569.238 ms | 24 - 722 MB | FP16 | NPU | [Segment-Anything-Model.onnx](https://huggingface.co/qualcomm/Segment-Anything-Model/blob/main/SAMEncoderPart4.onnx) |
56
- | SAMEncoderPart4 | Snapdragon 8 Elite QRD | Snapdragon® 8 Elite | ONNX | 478.143 ms | 24 - 699 MB | FP16 | NPU | [Segment-Anything-Model.onnx](https://huggingface.co/qualcomm/Segment-Anything-Model/blob/main/SAMEncoderPart4.onnx) |
57
- | SAMEncoderPart4 | Snapdragon X Elite CRD | Snapdragon® X Elite | ONNX | 730.872 ms | 33 - 33 MB | FP16 | NPU | [Segment-Anything-Model.onnx](https://huggingface.co/qualcomm/Segment-Anything-Model/blob/main/SAMEncoderPart4.onnx) |
58
- | SAMEncoderPart5 | Samsung Galaxy S23 | Snapdragon® 8 Gen 2 | ONNX | 772.375 ms | 0 - 133 MB | FP16 | NPU | [Segment-Anything-Model.onnx](https://huggingface.co/qualcomm/Segment-Anything-Model/blob/main/SAMEncoderPart5.onnx) |
59
- | SAMEncoderPart5 | Samsung Galaxy S24 | Snapdragon® 8 Gen 3 | ONNX | 568.921 ms | 24 - 720 MB | FP16 | NPU | [Segment-Anything-Model.onnx](https://huggingface.co/qualcomm/Segment-Anything-Model/blob/main/SAMEncoderPart5.onnx) |
60
- | SAMEncoderPart5 | Snapdragon 8 Elite QRD | Snapdragon® 8 Elite | ONNX | 481.0 ms | 12 - 686 MB | FP16 | NPU | [Segment-Anything-Model.onnx](https://huggingface.co/qualcomm/Segment-Anything-Model/blob/main/SAMEncoderPart5.onnx) |
61
- | SAMEncoderPart5 | Snapdragon X Elite CRD | Snapdragon® X Elite | ONNX | 737.772 ms | 33 - 33 MB | FP16 | NPU | [Segment-Anything-Model.onnx](https://huggingface.co/qualcomm/Segment-Anything-Model/blob/main/SAMEncoderPart5.onnx) |
62
- | SAMEncoderPart6 | Samsung Galaxy S23 | Snapdragon® 8 Gen 2 | ONNX | 768.673 ms | 12 - 148 MB | FP16 | NPU | [Segment-Anything-Model.onnx](https://huggingface.co/qualcomm/Segment-Anything-Model/blob/main/SAMEncoderPart6.onnx) |
63
- | SAMEncoderPart6 | Samsung Galaxy S24 | Snapdragon® 8 Gen 3 | ONNX | 568.747 ms | 22 - 726 MB | FP16 | NPU | [Segment-Anything-Model.onnx](https://huggingface.co/qualcomm/Segment-Anything-Model/blob/main/SAMEncoderPart6.onnx) |
64
- | SAMEncoderPart6 | Snapdragon 8 Elite QRD | Snapdragon® 8 Elite | ONNX | 531.699 ms | 12 - 686 MB | FP16 | NPU | [Segment-Anything-Model.onnx](https://huggingface.co/qualcomm/Segment-Anything-Model/blob/main/SAMEncoderPart6.onnx) |
65
- | SAMEncoderPart6 | Snapdragon X Elite CRD | Snapdragon® X Elite | ONNX | 727.465 ms | 33 - 33 MB | FP16 | NPU | [Segment-Anything-Model.onnx](https://huggingface.co/qualcomm/Segment-Anything-Model/blob/main/SAMEncoderPart6.onnx) |
66
 
67
 
68
 
@@ -125,65 +116,11 @@ Profiling Results
125
  ------------------------------------------------------------
126
  SAMDecoder
127
  Device : Samsung Galaxy S23 (13)
128
- Runtime : ONNX
129
- Estimated inference time (ms) : 8.4
130
- Estimated peak memory usage (MB): [1, 57]
131
- Total # Ops : 868
132
- Compute Unit(s) : NPU (868 ops)
133
-
134
- ------------------------------------------------------------
135
- SAMEncoderPart1
136
- Device : Samsung Galaxy S23 (13)
137
- Runtime : ONNX
138
- Estimated inference time (ms) : 229.0
139
- Estimated peak memory usage (MB): [12, 181]
140
- Total # Ops : 623
141
- Compute Unit(s) : NPU (623 ops)
142
-
143
- ------------------------------------------------------------
144
- SAMEncoderPart2
145
- Device : Samsung Galaxy S23 (13)
146
- Runtime : ONNX
147
- Estimated inference time (ms) : 781.7
148
- Estimated peak memory usage (MB): [12, 147]
149
- Total # Ops : 610
150
- Compute Unit(s) : NPU (610 ops)
151
-
152
- ------------------------------------------------------------
153
- SAMEncoderPart3
154
- Device : Samsung Galaxy S23 (13)
155
- Runtime : ONNX
156
- Estimated inference time (ms) : 779.7
157
- Estimated peak memory usage (MB): [12, 159]
158
- Total # Ops : 610
159
- Compute Unit(s) : NPU (610 ops)
160
-
161
- ------------------------------------------------------------
162
- SAMEncoderPart4
163
- Device : Samsung Galaxy S23 (13)
164
- Runtime : ONNX
165
- Estimated inference time (ms) : 770.1
166
- Estimated peak memory usage (MB): [12, 151]
167
- Total # Ops : 610
168
- Compute Unit(s) : NPU (610 ops)
169
-
170
- ------------------------------------------------------------
171
- SAMEncoderPart5
172
- Device : Samsung Galaxy S23 (13)
173
- Runtime : ONNX
174
- Estimated inference time (ms) : 772.4
175
- Estimated peak memory usage (MB): [0, 133]
176
- Total # Ops : 610
177
- Compute Unit(s) : NPU (610 ops)
178
-
179
- ------------------------------------------------------------
180
- SAMEncoderPart6
181
- Device : Samsung Galaxy S23 (13)
182
- Runtime : ONNX
183
- Estimated inference time (ms) : 768.7
184
- Estimated peak memory usage (MB): [12, 148]
185
- Total # Ops : 610
186
- Compute Unit(s) : NPU (610 ops)
187
  ```
188
 
189
 
@@ -205,123 +142,26 @@ import qai_hub as hub
205
  from qai_hub_models.models.sam import Model
206
 
207
  # Load the model
208
- model = Model.from_pretrained()
209
- decoder_model = model.decoder
210
- encoder_splits[0]_model = model.encoder_splits[0]
211
- encoder_splits[1]_model = model.encoder_splits[1]
212
- encoder_splits[2]_model = model.encoder_splits[2]
213
- encoder_splits[3]_model = model.encoder_splits[3]
214
- encoder_splits[4]_model = model.encoder_splits[4]
215
- encoder_splits[5]_model = model.encoder_splits[5]
216
 
217
  # Device
218
- device = hub.Device("Samsung Galaxy S23")
219
-
220
- # Trace model
221
- decoder_input_shape = decoder_model.get_input_spec()
222
- decoder_sample_inputs = decoder_model.sample_inputs()
223
-
224
- traced_decoder_model = torch.jit.trace(decoder_model, [torch.tensor(data[0]) for _, data in decoder_sample_inputs.items()])
225
-
226
- # Compile model on a specific device
227
- decoder_compile_job = hub.submit_compile_job(
228
- model=traced_decoder_model ,
229
- device=device,
230
- input_specs=decoder_model.get_input_spec(),
231
- )
232
-
233
- # Get target model to run on-device
234
- decoder_target_model = decoder_compile_job.get_target_model()
235
- # Trace model
236
- encoder_splits[0]_input_shape = encoder_splits[0]_model.get_input_spec()
237
- encoder_splits[0]_sample_inputs = encoder_splits[0]_model.sample_inputs()
238
-
239
- traced_encoder_splits[0]_model = torch.jit.trace(encoder_splits[0]_model, [torch.tensor(data[0]) for _, data in encoder_splits[0]_sample_inputs.items()])
240
 
241
- # Compile model on a specific device
242
- encoder_splits[0]_compile_job = hub.submit_compile_job(
243
- model=traced_encoder_splits[0]_model ,
244
- device=device,
245
- input_specs=encoder_splits[0]_model.get_input_spec(),
246
- )
247
-
248
- # Get target model to run on-device
249
- encoder_splits[0]_target_model = encoder_splits[0]_compile_job.get_target_model()
250
  # Trace model
251
- encoder_splits[1]_input_shape = encoder_splits[1]_model.get_input_spec()
252
- encoder_splits[1]_sample_inputs = encoder_splits[1]_model.sample_inputs()
253
 
254
- traced_encoder_splits[1]_model = torch.jit.trace(encoder_splits[1]_model, [torch.tensor(data[0]) for _, data in encoder_splits[1]_sample_inputs.items()])
255
 
256
  # Compile model on a specific device
257
- encoder_splits[1]_compile_job = hub.submit_compile_job(
258
- model=traced_encoder_splits[1]_model ,
259
  device=device,
260
- input_specs=encoder_splits[1]_model.get_input_spec(),
261
  )
262
 
263
  # Get target model to run on-device
264
- encoder_splits[1]_target_model = encoder_splits[1]_compile_job.get_target_model()
265
- # Trace model
266
- encoder_splits[2]_input_shape = encoder_splits[2]_model.get_input_spec()
267
- encoder_splits[2]_sample_inputs = encoder_splits[2]_model.sample_inputs()
268
-
269
- traced_encoder_splits[2]_model = torch.jit.trace(encoder_splits[2]_model, [torch.tensor(data[0]) for _, data in encoder_splits[2]_sample_inputs.items()])
270
-
271
- # Compile model on a specific device
272
- encoder_splits[2]_compile_job = hub.submit_compile_job(
273
- model=traced_encoder_splits[2]_model ,
274
- device=device,
275
- input_specs=encoder_splits[2]_model.get_input_spec(),
276
- )
277
-
278
- # Get target model to run on-device
279
- encoder_splits[2]_target_model = encoder_splits[2]_compile_job.get_target_model()
280
- # Trace model
281
- encoder_splits[3]_input_shape = encoder_splits[3]_model.get_input_spec()
282
- encoder_splits[3]_sample_inputs = encoder_splits[3]_model.sample_inputs()
283
-
284
- traced_encoder_splits[3]_model = torch.jit.trace(encoder_splits[3]_model, [torch.tensor(data[0]) for _, data in encoder_splits[3]_sample_inputs.items()])
285
-
286
- # Compile model on a specific device
287
- encoder_splits[3]_compile_job = hub.submit_compile_job(
288
- model=traced_encoder_splits[3]_model ,
289
- device=device,
290
- input_specs=encoder_splits[3]_model.get_input_spec(),
291
- )
292
-
293
- # Get target model to run on-device
294
- encoder_splits[3]_target_model = encoder_splits[3]_compile_job.get_target_model()
295
- # Trace model
296
- encoder_splits[4]_input_shape = encoder_splits[4]_model.get_input_spec()
297
- encoder_splits[4]_sample_inputs = encoder_splits[4]_model.sample_inputs()
298
-
299
- traced_encoder_splits[4]_model = torch.jit.trace(encoder_splits[4]_model, [torch.tensor(data[0]) for _, data in encoder_splits[4]_sample_inputs.items()])
300
-
301
- # Compile model on a specific device
302
- encoder_splits[4]_compile_job = hub.submit_compile_job(
303
- model=traced_encoder_splits[4]_model ,
304
- device=device,
305
- input_specs=encoder_splits[4]_model.get_input_spec(),
306
- )
307
-
308
- # Get target model to run on-device
309
- encoder_splits[4]_target_model = encoder_splits[4]_compile_job.get_target_model()
310
- # Trace model
311
- encoder_splits[5]_input_shape = encoder_splits[5]_model.get_input_spec()
312
- encoder_splits[5]_sample_inputs = encoder_splits[5]_model.sample_inputs()
313
-
314
- traced_encoder_splits[5]_model = torch.jit.trace(encoder_splits[5]_model, [torch.tensor(data[0]) for _, data in encoder_splits[5]_sample_inputs.items()])
315
-
316
- # Compile model on a specific device
317
- encoder_splits[5]_compile_job = hub.submit_compile_job(
318
- model=traced_encoder_splits[5]_model ,
319
- device=device,
320
- input_specs=encoder_splits[5]_model.get_input_spec(),
321
- )
322
-
323
- # Get target model to run on-device
324
- encoder_splits[5]_target_model = encoder_splits[5]_compile_job.get_target_model()
325
 
326
  ```
327
 
@@ -333,35 +173,11 @@ After compiling models from step 1. Models can be profiled model on-device using
333
  provisioned in the cloud. Once the job is submitted, you can navigate to a
334
  provided job URL to view a variety of on-device performance metrics.
335
  ```python
336
- decoder_profile_job = hub.submit_profile_job(
337
- model=decoder_target_model,
338
  device=device,
339
  )
340
- encoder_splits[0]_profile_job = hub.submit_profile_job(
341
- model=encoder_splits[0]_target_model,
342
- device=device,
343
- )
344
- encoder_splits[1]_profile_job = hub.submit_profile_job(
345
- model=encoder_splits[1]_target_model,
346
- device=device,
347
- )
348
- encoder_splits[2]_profile_job = hub.submit_profile_job(
349
- model=encoder_splits[2]_target_model,
350
- device=device,
351
- )
352
- encoder_splits[3]_profile_job = hub.submit_profile_job(
353
- model=encoder_splits[3]_target_model,
354
- device=device,
355
- )
356
- encoder_splits[4]_profile_job = hub.submit_profile_job(
357
- model=encoder_splits[4]_target_model,
358
- device=device,
359
- )
360
- encoder_splits[5]_profile_job = hub.submit_profile_job(
361
- model=encoder_splits[5]_target_model,
362
- device=device,
363
- )
364
-
365
  ```
366
 
367
  Step 3: **Verify on-device accuracy**
@@ -369,55 +185,13 @@ Step 3: **Verify on-device accuracy**
369
  To verify the accuracy of the model on-device, you can run on-device inference
370
  on sample input data on the same cloud hosted device.
371
  ```python
372
- decoder_input_data = decoder_model.sample_inputs()
373
- decoder_inference_job = hub.submit_inference_job(
374
- model=decoder_target_model,
375
- device=device,
376
- inputs=decoder_input_data,
377
- )
378
- decoder_inference_job.download_output_data()
379
- encoder_splits[0]_input_data = encoder_splits[0]_model.sample_inputs()
380
- encoder_splits[0]_inference_job = hub.submit_inference_job(
381
- model=encoder_splits[0]_target_model,
382
- device=device,
383
- inputs=encoder_splits[0]_input_data,
384
- )
385
- encoder_splits[0]_inference_job.download_output_data()
386
- encoder_splits[1]_input_data = encoder_splits[1]_model.sample_inputs()
387
- encoder_splits[1]_inference_job = hub.submit_inference_job(
388
- model=encoder_splits[1]_target_model,
389
- device=device,
390
- inputs=encoder_splits[1]_input_data,
391
- )
392
- encoder_splits[1]_inference_job.download_output_data()
393
- encoder_splits[2]_input_data = encoder_splits[2]_model.sample_inputs()
394
- encoder_splits[2]_inference_job = hub.submit_inference_job(
395
- model=encoder_splits[2]_target_model,
396
- device=device,
397
- inputs=encoder_splits[2]_input_data,
398
- )
399
- encoder_splits[2]_inference_job.download_output_data()
400
- encoder_splits[3]_input_data = encoder_splits[3]_model.sample_inputs()
401
- encoder_splits[3]_inference_job = hub.submit_inference_job(
402
- model=encoder_splits[3]_target_model,
403
- device=device,
404
- inputs=encoder_splits[3]_input_data,
405
- )
406
- encoder_splits[3]_inference_job.download_output_data()
407
- encoder_splits[4]_input_data = encoder_splits[4]_model.sample_inputs()
408
- encoder_splits[4]_inference_job = hub.submit_inference_job(
409
- model=encoder_splits[4]_target_model,
410
- device=device,
411
- inputs=encoder_splits[4]_input_data,
412
- )
413
- encoder_splits[4]_inference_job.download_output_data()
414
- encoder_splits[5]_input_data = encoder_splits[5]_model.sample_inputs()
415
- encoder_splits[5]_inference_job = hub.submit_inference_job(
416
- model=encoder_splits[5]_target_model,
417
  device=device,
418
- inputs=encoder_splits[5]_input_data,
419
  )
420
- encoder_splits[5]_inference_job.download_output_data()
421
 
422
  ```
423
  With the output of the model, you can compute like PSNR, relative errors or
 
35
 
36
  | Model | Device | Chipset | Target Runtime | Inference Time (ms) | Peak Memory Range (MB) | Precision | Primary Compute Unit | Target Model
37
  |---|---|---|---|---|---|---|---|---|
38
+ | SAMDecoder | Samsung Galaxy S23 | Snapdragon® 8 Gen 2 | TFLITE | 7.351 ms | 0 - 33 MB | FP16 | NPU | [Segment-Anything-Model.tflite](https://huggingface.co/qualcomm/Segment-Anything-Model/blob/main/SAMDecoder.tflite) |
39
+ | SAMDecoder | Samsung Galaxy S23 | Snapdragon® 8 Gen 2 | ONNX | 8.997 ms | 1 - 65 MB | FP16 | NPU | [Segment-Anything-Model.onnx](https://huggingface.co/qualcomm/Segment-Anything-Model/blob/main/SAMDecoder.onnx) |
40
+ | SAMDecoder | Samsung Galaxy S24 | Snapdragon® 8 Gen 3 | TFLITE | 5.236 ms | 0 - 49 MB | FP16 | NPU | [Segment-Anything-Model.tflite](https://huggingface.co/qualcomm/Segment-Anything-Model/blob/main/SAMDecoder.tflite) |
41
+ | SAMDecoder | Samsung Galaxy S24 | Snapdragon® 8 Gen 3 | ONNX | 6.184 ms | 4 - 74 MB | FP16 | NPU | [Segment-Anything-Model.onnx](https://huggingface.co/qualcomm/Segment-Anything-Model/blob/main/SAMDecoder.onnx) |
42
+ | SAMDecoder | Snapdragon 8 Elite QRD | Snapdragon® 8 Elite | TFLITE | 4.145 ms | 0 - 43 MB | FP16 | NPU | [Segment-Anything-Model.tflite](https://huggingface.co/qualcomm/Segment-Anything-Model/blob/main/SAMDecoder.tflite) |
43
+ | SAMDecoder | Snapdragon 8 Elite QRD | Snapdragon® 8 Elite | QNN | 3.618 ms | 4 - 41 MB | FP16 | NPU | Use Export Script |
44
+ | SAMDecoder | Snapdragon 8 Elite QRD | Snapdragon® 8 Elite | ONNX | 5.516 ms | 2 - 59 MB | FP16 | NPU | [Segment-Anything-Model.onnx](https://huggingface.co/qualcomm/Segment-Anything-Model/blob/main/SAMDecoder.onnx) |
45
+ | SAMDecoder | SA7255P ADP | SA7255P | TFLITE | 53.049 ms | 0 - 40 MB | FP16 | NPU | [Segment-Anything-Model.tflite](https://huggingface.co/qualcomm/Segment-Anything-Model/blob/main/SAMDecoder.tflite) |
46
+ | SAMDecoder | SA8255 (Proxy) | SA8255P Proxy | TFLITE | 7.365 ms | 0 - 31 MB | FP16 | NPU | [Segment-Anything-Model.tflite](https://huggingface.co/qualcomm/Segment-Anything-Model/blob/main/SAMDecoder.tflite) |
47
+ | SAMDecoder | SA8295P ADP | SA8295P | TFLITE | 9.842 ms | 0 - 36 MB | FP16 | NPU | [Segment-Anything-Model.tflite](https://huggingface.co/qualcomm/Segment-Anything-Model/blob/main/SAMDecoder.tflite) |
48
+ | SAMDecoder | SA8295P ADP | SA8295P | QNN | 7.413 ms | 0 - 17 MB | FP16 | NPU | Use Export Script |
49
+ | SAMDecoder | SA8650 (Proxy) | SA8650P Proxy | TFLITE | 7.376 ms | 0 - 29 MB | FP16 | NPU | [Segment-Anything-Model.tflite](https://huggingface.co/qualcomm/Segment-Anything-Model/blob/main/SAMDecoder.tflite) |
50
+ | SAMDecoder | SA8775P ADP | SA8775P | TFLITE | 10.421 ms | 0 - 40 MB | FP16 | NPU | [Segment-Anything-Model.tflite](https://huggingface.co/qualcomm/Segment-Anything-Model/blob/main/SAMDecoder.tflite) |
51
+ | SAMDecoder | QCS8275 (Proxy) | QCS8275 Proxy | TFLITE | 53.049 ms | 0 - 40 MB | FP16 | NPU | [Segment-Anything-Model.tflite](https://huggingface.co/qualcomm/Segment-Anything-Model/blob/main/SAMDecoder.tflite) |
52
+ | SAMDecoder | QCS8550 (Proxy) | QCS8550 Proxy | TFLITE | 7.36 ms | 0 - 28 MB | FP16 | NPU | [Segment-Anything-Model.tflite](https://huggingface.co/qualcomm/Segment-Anything-Model/blob/main/SAMDecoder.tflite) |
53
+ | SAMDecoder | QCS9075 (Proxy) | QCS9075 Proxy | TFLITE | 10.421 ms | 0 - 40 MB | FP16 | NPU | [Segment-Anything-Model.tflite](https://huggingface.co/qualcomm/Segment-Anything-Model/blob/main/SAMDecoder.tflite) |
54
+ | SAMDecoder | QCS8450 (Proxy) | QCS8450 Proxy | TFLITE | 8.957 ms | 0 - 46 MB | FP16 | NPU | [Segment-Anything-Model.tflite](https://huggingface.co/qualcomm/Segment-Anything-Model/blob/main/SAMDecoder.tflite) |
55
+ | SAMDecoder | QCS8450 (Proxy) | QCS8450 Proxy | QNN | 7.801 ms | 4 - 41 MB | FP16 | NPU | Use Export Script |
56
+ | SAMDecoder | Snapdragon X Elite CRD | Snapdragon® X Elite | ONNX | 10.032 ms | 12 - 12 MB | FP16 | NPU | [Segment-Anything-Model.onnx](https://huggingface.co/qualcomm/Segment-Anything-Model/blob/main/SAMDecoder.onnx) |
 
 
 
 
 
 
 
 
 
57
 
58
 
59
 
 
116
  ------------------------------------------------------------
117
  SAMDecoder
118
  Device : Samsung Galaxy S23 (13)
119
+ Runtime : TFLITE
120
+ Estimated inference time (ms) : 7.4
121
+ Estimated peak memory usage (MB): [0, 33]
122
+ Total # Ops : 845
123
+ Compute Unit(s) : NPU (845 ops)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
124
  ```
125
 
126
 
 
142
  from qai_hub_models.models.sam import Model
143
 
144
  # Load the model
145
+ torch_model = Model.from_pretrained()
 
 
 
 
 
 
 
146
 
147
  # Device
148
+ device = hub.Device("Samsung Galaxy S24")
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
149
 
 
 
 
 
 
 
 
 
 
150
  # Trace model
151
+ input_shape = torch_model.get_input_spec()
152
+ sample_inputs = torch_model.sample_inputs()
153
 
154
+ pt_model = torch.jit.trace(torch_model, [torch.tensor(data[0]) for _, data in sample_inputs.items()])
155
 
156
  # Compile model on a specific device
157
+ compile_job = hub.submit_compile_job(
158
+ model=pt_model,
159
  device=device,
160
+ input_specs=torch_model.get_input_spec(),
161
  )
162
 
163
  # Get target model to run on-device
164
+ target_model = compile_job.get_target_model()
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
165
 
166
  ```
167
 
 
173
  provisioned in the cloud. Once the job is submitted, you can navigate to a
174
  provided job URL to view a variety of on-device performance metrics.
175
  ```python
176
+ profile_job = hub.submit_profile_job(
177
+ model=target_model,
178
  device=device,
179
  )
180
+
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
181
  ```
182
 
183
  Step 3: **Verify on-device accuracy**
 
185
  To verify the accuracy of the model on-device, you can run on-device inference
186
  on sample input data on the same cloud hosted device.
187
  ```python
188
+ input_data = torch_model.sample_inputs()
189
+ inference_job = hub.submit_inference_job(
190
+ model=target_model,
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
191
  device=device,
192
+ inputs=input_data,
193
  )
194
+ on_device_output = inference_job.download_output_data()
195
 
196
  ```
197
  With the output of the model, you can compute like PSNR, relative errors or