Update README.md
Browse files
README.md
CHANGED
@@ -1,6 +1,160 @@
|
|
1 |
-
---
|
2 |
-
license: other
|
3 |
-
license_name: sla0044
|
4 |
-
license_link: >-
|
5 |
-
https://github.com/STMicroelectronics/stm32ai-modelzoo/image_classification/efficientnet/ST_pretrainedmodel_public_dataset/LICENSE.md
|
6 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
license: other
|
3 |
+
license_name: sla0044
|
4 |
+
license_link: >-
|
5 |
+
https://github.com/STMicroelectronics/stm32ai-modelzoo/image_classification/efficientnet/ST_pretrainedmodel_public_dataset/LICENSE.md
|
6 |
+
pipeline_tag: image-classification
|
7 |
+
---
|
8 |
+
# EfficientNet
|
9 |
+
|
10 |
+
## **Use case** : `Image classification`
|
11 |
+
|
12 |
+
# Model description
|
13 |
+
EfficientNet was initially introduced in this [paper](https://arxiv.org/pdf/1905.11946.pdf).
|
14 |
+
The authors proposed a method that uniformly scales all dimensions depth/width/resolution using a so-called compound coefficient.
|
15 |
+
Using neural architecture search, the authors created the EfficientNet topology and starting from B0, derived a few variants B1...B7 ordered by increasing complexity.
|
16 |
+
Its main building blocks are a mobile inverted bottleneck MBConv (Sandler et al., 2018; Tan et al., 2019) and a squeeze-and-excitation optimization (Hu et al., 2018).
|
17 |
+
|
18 |
+
EfficientNet provides state-of-the art accuracy on ImageNet and CIFAR for example while being much smaller and faster
|
19 |
+
than its comparable (ResNet, DenseNet, Inception...).
|
20 |
+
However, for STM32 platforms, B0 is already too large. That's why, we internally derived a custom version tailored for STM32
|
21 |
+
and modified it to be quantization-friendly (not discussed in the initial paper). This custom model is then quantized in int8 using Tensorflow Lite converter.
|
22 |
+
In the following, the resulting model is called ST EfficientNet LC v1 (LC standing for Low Complexity).
|
23 |
+
|
24 |
+
ST EfficientNet LC v1 was obtained after fine-tuning of the original topology. Our goal was to reach around 500 kBytes for RAM and weights.
|
25 |
+
For achieving this, we decided to replace original 'swish' by a simple 'relu6', and search for good expansion factor, depth
|
26 |
+
and width coefficients. Of course, many models could meet the requirement. We selected the one which was better performing on food-101 dataset.
|
27 |
+
We made several attempts to quantize the EfficientNet topology, and discover some issues when quantizing activations.
|
28 |
+
The problem was fixed mainly by adding a clipping lambda layer before the sigmoid.
|
29 |
+
|
30 |
+
## Network information
|
31 |
+
| Network Information | Value |
|
32 |
+
|---------------------|---------------------------------------|
|
33 |
+
| Framework | TensorFlow Lite |
|
34 |
+
| Params | 517540 |
|
35 |
+
| Quantization | int8 |
|
36 |
+
| Paper | https://arxiv.org/pdf/1905.11946.pdf |
|
37 |
+
|
38 |
+
The models are quantized using tensorflow lite converter.
|
39 |
+
|
40 |
+
## Network inputs / outputs
|
41 |
+
For an image resolution of NxM and P classes :
|
42 |
+
|
43 |
+
| Input Shape | Description |
|
44 |
+
|---------------|----------------------------------------------------------|
|
45 |
+
| (1, N, M, 3) | Single NxM RGB image with UINT8 values between 0 and 255 |
|
46 |
+
|
47 |
+
| Output Shape | Description |
|
48 |
+
|---------------|----------------------------------------------------------|
|
49 |
+
| (1, P) | Per-class confidence for P classes |
|
50 |
+
|
51 |
+
|
52 |
+
## Recommended platform
|
53 |
+
| Platform | Supported | Recommended |
|
54 |
+
|----------|-----------|---------------|
|
55 |
+
| STM32L0 | [] | [] |
|
56 |
+
| STM32L4 | [] | [] |
|
57 |
+
| STM32U5 | [x] | [] |
|
58 |
+
| STM32H7 | [x] | [x] |
|
59 |
+
| STM32MP1 | [x] | [x] |
|
60 |
+
| STM32MP2 | [x] | [] |
|
61 |
+
| STM32N6 | [x] | [] |
|
62 |
+
|
63 |
+
---
|
64 |
+
# Performances
|
65 |
+
|
66 |
+
## Metrics
|
67 |
+
Measures are done with default STM32Cube.AI configuration with enabled input / output allocated option.
|
68 |
+
|
69 |
+
### Reference **NPU** memory footprint on food-101 dataset (see Accuracy for details on dataset)
|
70 |
+
|Model | Format | Resolution | Series | Internal RAM (KiB) | External RAM (KiB) | Weights Flash (KiB)| STM32Cube.AI version | STEdgeAI Core version |
|
71 |
+
|----------|--------|-------------|------------------|------------------|---------------------|-------|----------------------|-------------------------|
|
72 |
+
| [ST EfficientNet LC v1 tfs](https://github.com/STMicroelectronics/stm32ai-modelzoo/image_classification/efficientnet/ST_pretrainedmodel_public_dataset/food-101/st_efficientnet_lc_v1_128_tfs/st_efficientnet_lc_v1_128_tfs_int8.tflite) | Int8 | 128x128x3 | STM32N6 | 256 | 0 | 625.8 | 10.0.0 | 2.0.0 |
|
73 |
+
| [ST EfficientNet LC v1 tfs](https://github.com/STMicroelectronics/stm32ai-modelzoo/image_classification/efficientnet/ST_pretrainedmodel_public_dataset/food-101/st_efficientnet_lc_v1_224_tfs/st_efficientnet_lc_v1_224_tfs_int8.tflite) | Int8 | 224x224x3 | STM32N6 | 784.02 | 0 | 632.55 | 10.0.0 | 2.0.0 |
|
74 |
+
|
75 |
+
### Reference **NPU** inference time on food-101 dataset (see Accuracy for details on dataset)
|
76 |
+
| Model | Format | Resolution | Board | Execution Engine | Inference time (ms) | Inf / sec | STM32Cube.AI version | STEdgeAI Core version |
|
77 |
+
|--------|--------|-------------|------------------|------------------|---------------------|-------|----------------------|-------------------------|
|
78 |
+
| [ST EfficientNet LC v1 tfs](https://github.com/STMicroelectronics/stm32ai-modelzoo/image_classification/efficientnet/ST_pretrainedmodel_public_dataset/food-101/st_efficientnet_lc_v1_128_tfs/st_efficientnet_lc_v1_128_tfs_int8.tflite)| Int8 | 128x128x3 | STM32N6570-DK | NPU/MCU | 6.87 | 145.55 | 10.0.0 | 2.0.0 |
|
79 |
+
| [ST EfficientNet LC v1 tfs](https://github.com/STMicroelectronics/stm32ai-modelzoo/image_classification/efficientnet/ST_pretrainedmodel_public_dataset/food-101/st_efficientnet_lc_v1_224_tfs/st_efficientnet_lc_v1_224_tfs_int8.tflite) | Int8 | 224x224x3 | STM32N6570-DK | NPU/MCU | 15.8 | 63.29 | 10.0.0 | 2.0.0 |
|
80 |
+
|
81 |
+
|
82 |
+
### Reference **MCU** memory footprints based on Flowers dataset (see Accuracy for details on dataset)
|
83 |
+
| Model | Format | Resolution | Series | Activation RAM | Runtime RAM | Weights Flash | Code Flash | Total RAM | Total Flash | STM32Cube.AI version |
|
84 |
+
|---------------------------|--------|--------------|---------|----------------|-------------|---------------|------------|------------|-------------|----------------------|
|
85 |
+
| ST EfficientNet LC v1 tfs | Int8 | 224x224x3 | STM32H7 | 430.78 KiB | 58.19 KiB | 505.41 KiB | 158.4 KiB | 488.97 KiB | 663.81 KiB | 10.0.0 |
|
86 |
+
| ST EfficientNet LC v1 tfs | Int8 | 128x128x3 | STM32H7 | 166.78 KiB | 57.86 KiB | 505.41 KiB | 157.68 KiB| 224.64 KiB | 663.09 KiB | 10.0.0 |
|
87 |
+
|
88 |
+
|
89 |
+
### Reference **MCU** inference time based on Flowers dataset (see Accuracy for details on dataset)
|
90 |
+
| Model | Format | Resolution | Board | Execution Engine | Frequency | Inference time (ms) | STM32Cube.AI version |
|
91 |
+
|---------------------------|--------|------------|-------------------|------------------|-----------|---------------------|----------------------|
|
92 |
+
| ST EfficientNet LC v1 tfs | Int8 | 224x224x3 | STM32H747I-DISCO | 1 CPU | 400 MHz | 438.33 ms | 10.0.0 |
|
93 |
+
| ST EfficientNet LC v1 tfs | Int8 | 128x128x3 | STM32H747I-DISCO | 1 CPU | 400 MHz | 144.96 ms | 10.0.0 |
|
94 |
+
| ST EfficientNet LC v1 tfs | Int8 | 224x224x3 | STM32F769I-DISCO | 1 CPU | 216 MHz | 871.7 ms | 10.0.0 |
|
95 |
+
| ST EfficientNet LC v1 tfs | Int8 | 128x128x3 | STM32F769I-DISCO | 1 CPU | 216 MHz | 259.5 ms | 10.0.0 |
|
96 |
+
|
97 |
+
|
98 |
+
### Reference **MPU** inference time based on Flowers dataset (see Accuracy for details on dataset)
|
99 |
+
| Model | Format | Resolution | Quantization | Board | Execution Engine | Frequency | Inference time (ms) | %NPU | %GPU | %CPU | X-LINUX-AI version | Framework |
|
100 |
+
|---------------------------|--------|------------|---------------|-------------------|------------------|-----------|---------------------|-------|-------|------|--------------------|-----------------------|
|
101 |
+
| ST EfficientNet LC v1 tfs | Int8 | 224x224x3 | per-channel** | STM32MP257F-DK2 | NPU/GPU | 800 MHz | 36.75 ms | 16.89 | 83.11 | 0 | v5.1.0 | OpenVX |
|
102 |
+
| ST EfficientNet LC v1 tfs | Int8 | 128x128x3 | per-channel** | STM32MP257F-DK2 | NPU/GPU | 800 MHz | 14.67 ms | 32.55 | 67.45 | 0 | v5.1.0 | OpenVX |
|
103 |
+
| ST EfficientNet LC v1 tfs | Int8 | 224x224x3 | per-channel | STM32MP157F-DK2 | 2 CPU | 800 MHz | 140.6 ms | NA | NA | 100 | v5.1.0 | TensorFlowLite 2.11.0 |
|
104 |
+
| ST EfficientNet LC v1 tfs | Int8 | 128x128x3 | per-channel | STM32MP157F-DK2 | 2 CPU | 800 MHz | 47.50 ms | NA | NA | 100 | v5.1.0 | TensorFlowLite 2.11.0 |
|
105 |
+
| ST EfficientNet LC v1 tfs | Int8 | 224x224x3 | per-channel | STM32MP135F-DK2 | 1 CPU | 1000 MHz | 198.7 ms | NA | NA | 100 | v5.1.0 | TensorFlowLite 2.11.0 |
|
106 |
+
| ST EfficientNet LC v1 tfs | Int8 | 128x128x3 | per-channel | STM32MP135F-DK2 | 1 CPU | 1000 MHz | 63.84 ms | NA | NA | 100 | v5.1.0 | TensorFlowLite 2.11.0 |
|
107 |
+
|
108 |
+
** **To get the most out of MP25 NPU hardware acceleration, please use per-tensor quantization**
|
109 |
+
|
110 |
+
### Accuracy with Flowers dataset
|
111 |
+
Dataset details: http://download.tensorflow.org/example_images/flower_photos.tgz , License CC - BY 2.0
|
112 |
+
Number of classes: 5, 3670 files
|
113 |
+
|
114 |
+
| Model | Format | Resolution | Top 1 Accuracy (%) |
|
115 |
+
|---------------------------|--------|------------|--------------------|
|
116 |
+
| ST EfficientNet LC v1 tfs | Float | 224x224x3 | 90.19 |
|
117 |
+
| ST EfficientNet LC v1 tfs | Int8 | 224x224x3 | 89.92 |
|
118 |
+
| ST EfficientNet LC v1 tfs | Float | 128x128x3 | 87.19 |
|
119 |
+
| ST EfficientNet LC v1 tfs | Int8 | 128x128x3 | 86.78 |
|
120 |
+
|
121 |
+
|
122 |
+
### Accuracy with Plant dataset
|
123 |
+
Dataset details: https://data.mendeley.com/datasets/tywbtsjrjv/1 , License CC0 1.0
|
124 |
+
Number of classes: 39, number of files: 55448
|
125 |
+
|
126 |
+
| Model | Format | Resolution | Top 1 Accuracy (%) |
|
127 |
+
|---------------------------|--------|------------|--------------------|
|
128 |
+
| ST EfficientNet LC v1 tfs | Float | 224x224x3 | 99.86 |
|
129 |
+
| ST EfficientNet LC v1 tfs | Int8 | 224x224x3 | 99.78 |
|
130 |
+
| ST EfficientNet LC v1 tfs | Float | 128x128x3 | 99.76 |
|
131 |
+
| ST EfficientNet LC v1 tfs | Int8 | 128x128x3 | 99.63 |
|
132 |
+
|
133 |
+
|
134 |
+
### Accuracy with Food-101 dataset
|
135 |
+
Dataset details: https://data.vision.ee.ethz.ch/cvl/datasets_extra/food-101/,
|
136 |
+
Number of classes: 101, number of files: 101000
|
137 |
+
|
138 |
+
| Model | Format | Resolution | Top 1 Accuracy (%) |
|
139 |
+
|---------------------------|--------|------------|--------------------|
|
140 |
+
| ST EfficientNet LC v1 tfs | Float | 224x224x3 | 74.84 |
|
141 |
+
| ST EfficientNet LC v1 tfs | Int8 | 224x224x3 | 74.44 |
|
142 |
+
| ST EfficientNet LC v1 tfs | Float | 128x128x3 | 63.58 |
|
143 |
+
| ST EfficientNet LC v1 tfs | Int8 | 128x128x3 | 63.07 |
|
144 |
+
|
145 |
+
|
146 |
+
## Retraining and Integration in a simple example:
|
147 |
+
|
148 |
+
Please refer to the stm32ai-modelzoo-services GitHub [here](https://github.com/STMicroelectronics/stm32ai-modelzoo-services)
|
149 |
+
|
150 |
+
|
151 |
+
# References
|
152 |
+
|
153 |
+
<a id="1">[1]</a>
|
154 |
+
"Tf_flowers : tensorflow datasets," TensorFlow. [Online]. Available: https://www.tensorflow.org/datasets/catalog/tf_flowers.
|
155 |
+
|
156 |
+
<a id="2">[2]</a>
|
157 |
+
J, ARUN PANDIAN; GOPAL, GEETHARAMANI (2019), "Data for: Identification of Plant Leaf Diseases Using a 9-layer Deep Convolutional Neural Network", Mendeley Data, V1, doi: 10.17632/tywbtsjrjv.1
|
158 |
+
|
159 |
+
<a id="3">[3]</a>
|
160 |
+
L. Bossard, M. Guillaumin, and L. Van Gool, "Food-101 -- Mining Discriminative Components with Random Forests." European Conference on Computer Vision, 2014.
|