Spaces:
Sleeping
Sleeping
Commit
Β·
3bd4938
1
Parent(s):
8de6687
model_summary, performance_profiling
Browse files
README.md
CHANGED
@@ -10,4 +10,233 @@ pinned: false
|
|
10 |
license: mit
|
11 |
---
|
12 |
|
13 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
10 |
license: mit
|
11 |
---
|
12 |
|
13 |
+
# The Unsolved MNIST π’
|
14 |
+
**M**odified **N**ational **I**nstitute for **S**tandards and **T**echnology Dataset
|
15 |
+
|
16 |
+
# Description
|
17 |
+
|
18 |
+
# Setup
|
19 |
+
|
20 |
+
# Objective
|
21 |
+
|
22 |
+
# Logs
|
23 |
+
|
24 |
+
## Model Summary
|
25 |
+
```log
|
26 |
+
========================================================================================================================
|
27 |
+
Layer (type (var_name)) Input Shape Output Shape Param # Kernel Shape Mult-Adds
|
28 |
+
========================================================================================================================
|
29 |
+
LitMNISTModel (LitMNISTModel) [32, 1, 28, 28] [32, 10] -- -- --
|
30 |
+
ββNet (model) [32, 1, 28, 28] [32, 10] -- -- --
|
31 |
+
β ββconv1.0.weight ββ72 [8, 1, 3, 3]
|
32 |
+
β ββconv1.2.weight ββ8 [8]
|
33 |
+
β ββconv1.2.bias ββ8 [8]
|
34 |
+
β ββconv1.4.weight ββ720 [10, 8, 3, 3]
|
35 |
+
β ββconv1.6.weight ββ10 [10]
|
36 |
+
β ββconv1.6.bias ββ10 [10]
|
37 |
+
β ββconv1.8.weight ββ900 [10, 10, 3, 3]
|
38 |
+
β ββconv1.10.weight ββ10 [10]
|
39 |
+
β ββconv1.10.bias ββ10 [10]
|
40 |
+
β ββtrans1.1.weight ββ80 [8, 10, 1, 1]
|
41 |
+
β ββconv2.0.weight ββ720 [10, 8, 3, 3]
|
42 |
+
β ββconv2.1.weight ββ10 [10]
|
43 |
+
β ββconv2.1.bias ββ10 [10]
|
44 |
+
β ββconv2.4.weight ββ1,080 [12, 10, 3, 3]
|
45 |
+
β ββconv2.5.weight ββ12 [12]
|
46 |
+
β ββconv2.5.bias ββ12 [12]
|
47 |
+
β ββconv2.8.weight ββ1,296 [12, 12, 3, 3]
|
48 |
+
β ββconv2.9.weight ββ12 [12]
|
49 |
+
β ββconv2.9.bias ββ12 [12]
|
50 |
+
β ββtrans2.1.weight ββ96 [8, 12, 1, 1]
|
51 |
+
β ββtrans2.2.weight ββ8 [8]
|
52 |
+
β ββtrans2.2.bias ββ8 [8]
|
53 |
+
β ββconv3.0.weight ββ720 [10, 8, 3, 3]
|
54 |
+
β ββconv3.1.weight ββ10 [10]
|
55 |
+
β ββconv3.1.bias ββ10 [10]
|
56 |
+
β ββconv3.4.weight ββ1,080 [12, 10, 3, 3]
|
57 |
+
β ββconv3.6.weight ββ12 [12]
|
58 |
+
β ββconv3.6.bias ββ12 [12]
|
59 |
+
β ββtrans3.0.weight ββ120 [10, 12, 1, 1]
|
60 |
+
β ββtrans3.2.weight ββ10 [10]
|
61 |
+
β ββtrans3.2.bias ββ10 [10]
|
62 |
+
β ββout4.0.weight ββ900 [10, 10, 3, 3]
|
63 |
+
β ββSequential (conv1) [32, 1, 28, 28] [32, 10, 28, 28] -- -- --
|
64 |
+
β β ββ0.weight ββ72 [8, 1, 3, 3]
|
65 |
+
β β ββ2.weight ββ8 [8]
|
66 |
+
β β ββ2.bias ββ8 [8]
|
67 |
+
β β ββ4.weight ββ720 [10, 8, 3, 3]
|
68 |
+
β β ββ6.weight ββ10 [10]
|
69 |
+
β β ββ6.bias ββ10 [10]
|
70 |
+
β β ββ8.weight ββ900 [10, 10, 3, 3]
|
71 |
+
β β ββ10.weight ββ10 [10]
|
72 |
+
β β ββ10.bias ββ10 [10]
|
73 |
+
β β ββConv2d (0) [32, 1, 28, 28] [32, 8, 28, 28] 72 [3, 3] 1,806,336
|
74 |
+
β β β ββweight ββ72 [1, 8, 3, 3]
|
75 |
+
β β ββReLU (1) [32, 8, 28, 28] [32, 8, 28, 28] -- -- --
|
76 |
+
β β ββBatchNorm2d (2) [32, 8, 28, 28] [32, 8, 28, 28] 16 -- 512
|
77 |
+
β β β ββweight ββ8 [8]
|
78 |
+
β β β ββbias ββ8 [8]
|
79 |
+
β β ββDropout2d (3) [32, 8, 28, 28] [32, 8, 28, 28] -- -- --
|
80 |
+
β β ββConv2d (4) [32, 8, 28, 28] [32, 10, 28, 28] 720 [3, 3] 18,063,360
|
81 |
+
β β β ββweight ββ720 [8, 10, 3, 3]
|
82 |
+
β β ββReLU (5) [32, 10, 28, 28] [32, 10, 28, 28] -- -- --
|
83 |
+
β β ββBatchNorm2d (6) [32, 10, 28, 28] [32, 10, 28, 28] 20 -- 640
|
84 |
+
β β β ββweight ββ10 [10]
|
85 |
+
β β β ββbias ββ10 [10]
|
86 |
+
β β ββDropout2d (7) [32, 10, 28, 28] [32, 10, 28, 28] -- -- --
|
87 |
+
β β ββConv2d (8) [32, 10, 28, 28] [32, 10, 28, 28] 900 [3, 3] 22,579,200
|
88 |
+
β β β ββweight ββ900 [10, 10, 3, 3]
|
89 |
+
β β ββReLU (9) [32, 10, 28, 28] [32, 10, 28, 28] -- -- --
|
90 |
+
β β ββBatchNorm2d (10) [32, 10, 28, 28] [32, 10, 28, 28] 20 -- 640
|
91 |
+
β β β ββweight ββ10 [10]
|
92 |
+
β β β ββbias ββ10 [10]
|
93 |
+
β β ββDropout2d (11) [32, 10, 28, 28] [32, 10, 28, 28] -- -- --
|
94 |
+
β ββSequential (trans1) [32, 10, 28, 28] [32, 8, 17, 17] -- -- --
|
95 |
+
β β ββ1.weight ββ80 [8, 10, 1, 1]
|
96 |
+
β β ββMaxPool2d (0) [32, 10, 28, 28] [32, 10, 15, 15] -- 2 --
|
97 |
+
β β ββConv2d (1) [32, 10, 15, 15] [32, 8, 17, 17] 80 [1, 1] 739,840
|
98 |
+
β β β ββweight ββ80 [10, 8, 1, 1]
|
99 |
+
β ββSequential (conv2) [32, 8, 17, 17] [32, 12, 17, 17] -- -- --
|
100 |
+
β β ββ0.weight ββ720 [10, 8, 3, 3]
|
101 |
+
β β ββ1.weight ββ10 [10]
|
102 |
+
β β ββ1.bias ββ10 [10]
|
103 |
+
β β ββ4.weight ββ1,080 [12, 10, 3, 3]
|
104 |
+
β β ββ5.weight ββ12 [12]
|
105 |
+
β β ββ5.bias ββ12 [12]
|
106 |
+
β β ββ8.weight ββ1,296 [12, 12, 3, 3]
|
107 |
+
β β ββ9.weight ββ12 [12]
|
108 |
+
β β ββ9.bias ββ12 [12]
|
109 |
+
β β ββConv2d (0) [32, 8, 17, 17] [32, 10, 17, 17] 720 [3, 3] 6,658,560
|
110 |
+
β β β ββweight ββ720 [8, 10, 3, 3]
|
111 |
+
β β ββBatchNorm2d (1) [32, 10, 17, 17] [32, 10, 17, 17] 20 -- 640
|
112 |
+
β β β ββweight ββ10 [10]
|
113 |
+
β β β ββbias ββ10 [10]
|
114 |
+
β β ββReLU (2) [32, 10, 17, 17] [32, 10, 17, 17] -- -- --
|
115 |
+
β β ββDropout2d (3) [32, 10, 17, 17] [32, 10, 17, 17] -- -- --
|
116 |
+
β β ββConv2d (4) [32, 10, 17, 17] [32, 12, 17, 17] 1,080 [3, 3] 9,987,840
|
117 |
+
β β β ββweight ββ1,080 [10, 12, 3, 3]
|
118 |
+
β β ββBatchNorm2d (5) [32, 12, 17, 17] [32, 12, 17, 17] 24 -- 768
|
119 |
+
β β β ββweight ββ12 [12]
|
120 |
+
β β β ββbias ββ12 [12]
|
121 |
+
β β ββReLU (6) [32, 12, 17, 17] [32, 12, 17, 17] -- -- --
|
122 |
+
β β ββDropout2d (7) [32, 12, 17, 17] [32, 12, 17, 17] -- -- --
|
123 |
+
β β ββConv2d (8) [32, 12, 17, 17] [32, 12, 17, 17] 1,296 [3, 3] 11,985,408
|
124 |
+
β β β ββweight ββ1,296 [12, 12, 3, 3]
|
125 |
+
β β ββBatchNorm2d (9) [32, 12, 17, 17] [32, 12, 17, 17] 24 -- 768
|
126 |
+
β β β ββweight ββ12 [12]
|
127 |
+
β β β ββbias ββ12 [12]
|
128 |
+
β β ββReLU (10) [32, 12, 17, 17] [32, 12, 17, 17] -- -- --
|
129 |
+
β β ββDropout2d (11) [32, 12, 17, 17] [32, 12, 17, 17] -- -- --
|
130 |
+
β ββSequential (trans2) [32, 12, 17, 17] [32, 8, 9, 9] -- -- --
|
131 |
+
β β ββ1.weight ββ96 [8, 12, 1, 1]
|
132 |
+
β β ββ2.weight ββ8 [8]
|
133 |
+
β β ββ2.bias ββ8 [8]
|
134 |
+
β β ββMaxPool2d (0) [32, 12, 17, 17] [32, 12, 9, 9] -- 2 --
|
135 |
+
β β ββConv2d (1) [32, 12, 9, 9] [32, 8, 9, 9] 96 [1, 1] 248,832
|
136 |
+
β β β ββweight ββ96 [12, 8, 1, 1]
|
137 |
+
β β ββBatchNorm2d (2) [32, 8, 9, 9] [32, 8, 9, 9] 16 -- 512
|
138 |
+
β β β ββweight ββ8 [8]
|
139 |
+
β β β ββbias ββ8 [8]
|
140 |
+
β ββSequential (conv3) [32, 8, 9, 9] [32, 12, 9, 9] -- -- --
|
141 |
+
β β ββ0.weight ββ720 [10, 8, 3, 3]
|
142 |
+
β β ββ1.weight ββ10 [10]
|
143 |
+
β β ββ1.bias ββ10 [10]
|
144 |
+
β β ββ4.weight ββ1,080 [12, 10, 3, 3]
|
145 |
+
β β ββ6.weight ββ12 [12]
|
146 |
+
β β ββ6.bias ββ12 [12]
|
147 |
+
β β ββConv2d (0) [32, 8, 9, 9] [32, 10, 9, 9] 720 [3, 3] 1,866,240
|
148 |
+
β β β ββweight ββ720 [8, 10, 3, 3]
|
149 |
+
β β ββBatchNorm2d (1) [32, 10, 9, 9] [32, 10, 9, 9] 20 -- 640
|
150 |
+
β β β ββweight ββ10 [10]
|
151 |
+
β β β ββbias ββ10 [10]
|
152 |
+
β β ββReLU (2) [32, 10, 9, 9] [32, 10, 9, 9] -- -- --
|
153 |
+
β β ββDropout2d (3) [32, 10, 9, 9] [32, 10, 9, 9] -- -- --
|
154 |
+
β β ββConv2d (4) [32, 10, 9, 9] [32, 12, 9, 9] 1,080 [3, 3] 2,799,360
|
155 |
+
β β β ββweight ββ1,080 [10, 12, 3, 3]
|
156 |
+
β β ββReLU (5) [32, 12, 9, 9] [32, 12, 9, 9] -- -- --
|
157 |
+
β β ββBatchNorm2d (6) [32, 12, 9, 9] [32, 12, 9, 9] 24 -- 768
|
158 |
+
β β β ββweight ββ12 [12]
|
159 |
+
β β β ββbias ββ12 [12]
|
160 |
+
β β ββDropout2d (7) [32, 12, 9, 9] [32, 12, 9, 9] -- -- --
|
161 |
+
β ββSequential (trans3) [32, 12, 9, 9] [32, 10, 4, 4] -- -- --
|
162 |
+
β β ββ0.weight ββ120 [10, 12, 1, 1]
|
163 |
+
β β ββ2.weight ββ10 [10]
|
164 |
+
β β ββ2.bias ββ10 [10]
|
165 |
+
β β ββConv2d (0) [32, 12, 9, 9] [32, 10, 9, 9] 120 [1, 1] 311,040
|
166 |
+
β β β ββweight ββ120 [12, 10, 1, 1]
|
167 |
+
β β ββMaxPool2d (1) [32, 10, 9, 9] [32, 10, 4, 4] -- 2 --
|
168 |
+
β β ββBatchNorm2d (2) [32, 10, 4, 4] [32, 10, 4, 4] 20 -- 640
|
169 |
+
β β β ββweight ββ10 [10]
|
170 |
+
β β β ββbias ββ10 [10]
|
171 |
+
β ββSequential (out4) [32, 10, 4, 4] [32, 10, 1, 1] -- -- --
|
172 |
+
β β ββ0.weight ββ900 [10, 10, 3, 3]
|
173 |
+
β β ββConv2d (0) [32, 10, 4, 4] [32, 10, 4, 4] 900 [3, 3] 460,800
|
174 |
+
β β β ββweight ββ900 [10, 10, 3, 3]
|
175 |
+
β β ββAvgPool2d (1) [32, 10, 4, 4] [32, 10, 1, 1] -- 3 --
|
176 |
+
========================================================================================================================
|
177 |
+
Total params: 7,988
|
178 |
+
Trainable params: 7,988
|
179 |
+
Non-trainable params: 0
|
180 |
+
Total mult-adds (Units.MEGABYTES): 77.51
|
181 |
+
========================================================================================================================
|
182 |
+
Input size (MB): 0.10
|
183 |
+
Forward/backward pass size (MB): 18.40
|
184 |
+
Params size (MB): 0.03
|
185 |
+
Estimated Total Size (MB): 18.53
|
186 |
+
========================================================================================================================
|
187 |
+
```
|
188 |
+
|
189 |
+
## Training Logs
|
190 |
+
|
191 |
+
```sh
|
192 |
+
cd /usr/home/:USER:/UnsolvedMNIST
|
193 |
+
tensorboard --logdir=logs
|
194 |
+
|
195 |
+
```
|
196 |
+
|
197 |
+
## Performance Profiling
|
198 |
+
```log
|
199 |
+
------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------
|
200 |
+
Name Self CPU % Self CPU CPU total % CPU total CPU time avg # of Calls
|
201 |
+
------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------
|
202 |
+
aten::cudnn_convolution 69.86% 444.597ms 69.86% 444.597ms 37.050ms 12
|
203 |
+
aten::_log_softmax 7.52% 47.831ms 7.52% 47.831ms 47.831ms 1
|
204 |
+
aten::clamp_min 4.42% 28.104ms 4.42% 28.104ms 3.513ms 8
|
205 |
+
aten::cudnn_batch_norm 4.13% 26.264ms 4.20% 26.758ms 2.676ms 10
|
206 |
+
aten::add_ 3.47% 22.086ms 3.47% 22.086ms 2.209ms 10
|
207 |
+
aten::bernoulli_ 2.79% 17.777ms 2.79% 17.777ms 2.222ms 8
|
208 |
+
aten::div_ 2.53% 16.126ms 2.53% 16.126ms 2.016ms 8
|
209 |
+
aten::mul 2.29% 14.584ms 2.29% 14.584ms 1.823ms 8
|
210 |
+
aten::avg_pool2d 0.63% 4.009ms 0.63% 4.009ms 4.009ms 1
|
211 |
+
aten::max_pool2d_with_indices 0.54% 3.446ms 0.54% 3.446ms 1.149ms 3
|
212 |
+
aten::convolution 0.39% 2.469ms 70.31% 447.487ms 37.291ms 12
|
213 |
+
aten::relu 0.28% 1.804ms 4.70% 29.908ms 3.739ms 8
|
214 |
+
aten::_batch_norm_impl_index 0.22% 1.430ms 4.43% 28.188ms 2.819ms 10
|
215 |
+
aten::batch_norm 0.16% 1.006ms 4.59% 29.194ms 2.919ms 10
|
216 |
+
aten::empty 0.12% 757.000us 0.12% 757.000us 11.828us 64
|
217 |
+
aten::max_pool2d 0.12% 751.000us 0.66% 4.197ms 1.399ms 3
|
218 |
+
aten::log_softmax 0.10% 653.000us 7.62% 48.484ms 48.484ms 1
|
219 |
+
aten::conv2d 0.10% 636.000us 70.41% 448.123ms 37.344ms 12
|
220 |
+
aten::feature_dropout 0.08% 479.000us 7.71% 49.058ms 6.132ms 8
|
221 |
+
aten::copy_ 0.07% 447.000us 0.07% 447.000us 63.857us 7
|
222 |
+
aten::_convolution 0.07% 421.000us 69.92% 445.018ms 37.085ms 12
|
223 |
+
aten::to 0.05% 291.000us 0.13% 843.000us 7.205us 117
|
224 |
+
aten::zeros 0.04% 270.000us 0.08% 523.000us 87.167us 6
|
225 |
+
aten::empty_strided 0.01% 60.000us 0.01% 60.000us 8.571us 7
|
226 |
+
aten::_to_copy 0.01% 45.000us 0.09% 552.000us 78.857us 7
|
227 |
+
aten::view 0.01% 39.000us 0.01% 39.000us 3.545us 11
|
228 |
+
aten::empty_like 0.00% 31.000us 0.06% 384.000us 38.400us 10
|
229 |
+
aten::new_empty 0.00% 23.000us 0.01% 92.000us 11.500us 8
|
230 |
+
aten::_has_compatible_shallow_copy_type 0.00% 2.000us 0.00% 2.000us 0.031us 64
|
231 |
+
aten::zero_ 0.00% 1.000us 0.00% 1.000us 0.167us 6
|
232 |
+
------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------
|
233 |
+
Self CPU time total: 636.439ms
|
234 |
+
|
235 |
+
```
|
236 |
+
|
237 |
+
|
238 |
+
# Credits:
|
239 |
+
- [pytorch_performance_profiling.md](https://gist.github.com/mingfeima/e08310d7e7bb9ae2a693adecf2d8a916)
|
240 |
+
- [FLOPs calculation](https://medium.com/@dzmitrybahdanau/the-flops-calculus-of-language-model-training-3b19c1f025e4)
|
241 |
+
- [software 2.0](https://karpathy.medium.com/software-2-0-a64152b37c35)
|
242 |
+
- [weight init](https://towardsdatascience.com/weight-initialization-in-neural-networks-a-journey-from-the-basics-to-kaiming-954fb9b47c79)
|