--- title: UnsolvedMNIST emoji: 🌍 colorFrom: blue colorTo: indigo sdk: gradio sdk_version: 4.37.2 app_file: app.py pinned: false license: mit --- # The Unsolved MNIST πŸ”’ **M**odified **N**ational **I**nstitute for **S**tandards and **T**echnology Dataset ###### TODO: Implementation # Description ###### TODO: Implementation # Setup ###### TODO: Implementation # Objective ###### TODO: Implementation # Logs ###### TODO: Implementation ## Model Summary ```log ======================================================================================================================== Layer (type (var_name)) Input Shape Output Shape Param # Kernel Shape Mult-Adds ======================================================================================================================== LitMNISTModel (LitMNISTModel) [32, 1, 28, 28] [32, 10] -- -- -- β”œβ”€Net (model) [32, 1, 28, 28] [32, 10] -- -- -- β”‚ └─conv1.0.weight β”œβ”€72 [8, 1, 3, 3] β”‚ └─conv1.2.weight β”œβ”€8 [8] β”‚ └─conv1.2.bias β”œβ”€8 [8] β”‚ └─conv1.4.weight β”œβ”€720 [10, 8, 3, 3] β”‚ └─conv1.6.weight β”œβ”€10 [10] β”‚ └─conv1.6.bias β”œβ”€10 [10] β”‚ └─conv1.8.weight β”œβ”€900 [10, 10, 3, 3] β”‚ └─conv1.10.weight β”œβ”€10 [10] β”‚ └─conv1.10.bias β”œβ”€10 [10] β”‚ └─trans1.1.weight β”œβ”€80 [8, 10, 1, 1] β”‚ └─conv2.0.weight β”œβ”€720 [10, 8, 3, 3] β”‚ └─conv2.1.weight β”œβ”€10 [10] β”‚ └─conv2.1.bias β”œβ”€10 [10] β”‚ └─conv2.4.weight β”œβ”€1,080 [12, 10, 3, 3] β”‚ └─conv2.5.weight β”œβ”€12 [12] β”‚ └─conv2.5.bias β”œβ”€12 [12] β”‚ └─conv2.8.weight β”œβ”€1,296 [12, 12, 3, 3] β”‚ └─conv2.9.weight β”œβ”€12 [12] β”‚ └─conv2.9.bias β”œβ”€12 [12] β”‚ └─trans2.1.weight β”œβ”€96 [8, 12, 1, 1] β”‚ └─trans2.2.weight β”œβ”€8 [8] β”‚ └─trans2.2.bias β”œβ”€8 [8] β”‚ └─conv3.0.weight β”œβ”€720 [10, 8, 3, 3] β”‚ └─conv3.1.weight β”œβ”€10 [10] β”‚ └─conv3.1.bias β”œβ”€10 [10] β”‚ └─conv3.4.weight β”œβ”€1,080 [12, 10, 3, 3] β”‚ └─conv3.6.weight β”œβ”€12 [12] β”‚ └─conv3.6.bias β”œβ”€12 [12] β”‚ └─trans3.0.weight β”œβ”€120 [10, 12, 1, 1] β”‚ └─trans3.2.weight β”œβ”€10 [10] β”‚ └─trans3.2.bias β”œβ”€10 [10] β”‚ └─out4.0.weight └─900 [10, 10, 3, 3] β”‚ └─Sequential (conv1) [32, 1, 28, 28] [32, 10, 28, 28] -- -- -- β”‚ β”‚ └─0.weight β”œβ”€72 [8, 1, 3, 3] β”‚ β”‚ └─2.weight β”œβ”€8 [8] β”‚ β”‚ └─2.bias β”œβ”€8 [8] β”‚ β”‚ └─4.weight β”œβ”€720 [10, 8, 3, 3] β”‚ β”‚ └─6.weight β”œβ”€10 [10] β”‚ β”‚ └─6.bias β”œβ”€10 [10] β”‚ β”‚ └─8.weight β”œβ”€900 [10, 10, 3, 3] β”‚ β”‚ └─10.weight β”œβ”€10 [10] β”‚ β”‚ └─10.bias └─10 [10] β”‚ β”‚ └─Conv2d (0) [32, 1, 28, 28] [32, 8, 28, 28] 72 [3, 3] 1,806,336 β”‚ β”‚ β”‚ └─weight └─72 [1, 8, 3, 3] β”‚ β”‚ └─ReLU (1) [32, 8, 28, 28] [32, 8, 28, 28] -- -- -- β”‚ β”‚ └─BatchNorm2d (2) [32, 8, 28, 28] [32, 8, 28, 28] 16 -- 512 β”‚ β”‚ β”‚ └─weight β”œβ”€8 [8] β”‚ β”‚ β”‚ └─bias └─8 [8] β”‚ β”‚ └─Dropout2d (3) [32, 8, 28, 28] [32, 8, 28, 28] -- -- -- β”‚ β”‚ └─Conv2d (4) [32, 8, 28, 28] [32, 10, 28, 28] 720 [3, 3] 18,063,360 β”‚ β”‚ β”‚ └─weight └─720 [8, 10, 3, 3] β”‚ β”‚ └─ReLU (5) [32, 10, 28, 28] [32, 10, 28, 28] -- -- -- β”‚ β”‚ └─BatchNorm2d (6) [32, 10, 28, 28] [32, 10, 28, 28] 20 -- 640 β”‚ β”‚ β”‚ └─weight β”œβ”€10 [10] β”‚ β”‚ β”‚ └─bias └─10 [10] β”‚ β”‚ └─Dropout2d (7) [32, 10, 28, 28] [32, 10, 28, 28] -- -- -- β”‚ β”‚ └─Conv2d (8) [32, 10, 28, 28] [32, 10, 28, 28] 900 [3, 3] 22,579,200 β”‚ β”‚ β”‚ └─weight └─900 [10, 10, 3, 3] β”‚ β”‚ └─ReLU (9) [32, 10, 28, 28] [32, 10, 28, 28] -- -- -- β”‚ β”‚ └─BatchNorm2d (10) [32, 10, 28, 28] [32, 10, 28, 28] 20 -- 640 β”‚ β”‚ β”‚ └─weight β”œβ”€10 [10] β”‚ β”‚ β”‚ └─bias └─10 [10] β”‚ β”‚ └─Dropout2d (11) [32, 10, 28, 28] [32, 10, 28, 28] -- -- -- β”‚ └─Sequential (trans1) [32, 10, 28, 28] [32, 8, 17, 17] -- -- -- β”‚ β”‚ └─1.weight └─80 [8, 10, 1, 1] β”‚ β”‚ └─MaxPool2d (0) [32, 10, 28, 28] [32, 10, 15, 15] -- 2 -- β”‚ β”‚ └─Conv2d (1) [32, 10, 15, 15] [32, 8, 17, 17] 80 [1, 1] 739,840 β”‚ β”‚ β”‚ └─weight └─80 [10, 8, 1, 1] β”‚ └─Sequential (conv2) [32, 8, 17, 17] [32, 12, 17, 17] -- -- -- β”‚ β”‚ └─0.weight β”œβ”€720 [10, 8, 3, 3] β”‚ β”‚ └─1.weight β”œβ”€10 [10] β”‚ β”‚ └─1.bias β”œβ”€10 [10] β”‚ β”‚ └─4.weight β”œβ”€1,080 [12, 10, 3, 3] β”‚ β”‚ └─5.weight β”œβ”€12 [12] β”‚ β”‚ └─5.bias β”œβ”€12 [12] β”‚ β”‚ └─8.weight β”œβ”€1,296 [12, 12, 3, 3] β”‚ β”‚ └─9.weight β”œβ”€12 [12] β”‚ β”‚ └─9.bias └─12 [12] β”‚ β”‚ └─Conv2d (0) [32, 8, 17, 17] [32, 10, 17, 17] 720 [3, 3] 6,658,560 β”‚ β”‚ β”‚ └─weight └─720 [8, 10, 3, 3] β”‚ β”‚ └─BatchNorm2d (1) [32, 10, 17, 17] [32, 10, 17, 17] 20 -- 640 β”‚ β”‚ β”‚ └─weight β”œβ”€10 [10] β”‚ β”‚ β”‚ └─bias └─10 [10] β”‚ β”‚ └─ReLU (2) [32, 10, 17, 17] [32, 10, 17, 17] -- -- -- β”‚ β”‚ └─Dropout2d (3) [32, 10, 17, 17] [32, 10, 17, 17] -- -- -- β”‚ β”‚ └─Conv2d (4) [32, 10, 17, 17] [32, 12, 17, 17] 1,080 [3, 3] 9,987,840 β”‚ β”‚ β”‚ └─weight └─1,080 [10, 12, 3, 3] β”‚ β”‚ └─BatchNorm2d (5) [32, 12, 17, 17] [32, 12, 17, 17] 24 -- 768 β”‚ β”‚ β”‚ └─weight β”œβ”€12 [12] β”‚ β”‚ β”‚ └─bias └─12 [12] β”‚ β”‚ └─ReLU (6) [32, 12, 17, 17] [32, 12, 17, 17] -- -- -- β”‚ β”‚ └─Dropout2d (7) [32, 12, 17, 17] [32, 12, 17, 17] -- -- -- β”‚ β”‚ └─Conv2d (8) [32, 12, 17, 17] [32, 12, 17, 17] 1,296 [3, 3] 11,985,408 β”‚ β”‚ β”‚ └─weight └─1,296 [12, 12, 3, 3] β”‚ β”‚ └─BatchNorm2d (9) [32, 12, 17, 17] [32, 12, 17, 17] 24 -- 768 β”‚ β”‚ β”‚ └─weight β”œβ”€12 [12] β”‚ β”‚ β”‚ └─bias └─12 [12] β”‚ β”‚ └─ReLU (10) [32, 12, 17, 17] [32, 12, 17, 17] -- -- -- β”‚ β”‚ └─Dropout2d (11) [32, 12, 17, 17] [32, 12, 17, 17] -- -- -- β”‚ └─Sequential (trans2) [32, 12, 17, 17] [32, 8, 9, 9] -- -- -- β”‚ β”‚ └─1.weight β”œβ”€96 [8, 12, 1, 1] β”‚ β”‚ └─2.weight β”œβ”€8 [8] β”‚ β”‚ └─2.bias └─8 [8] β”‚ β”‚ └─MaxPool2d (0) [32, 12, 17, 17] [32, 12, 9, 9] -- 2 -- β”‚ β”‚ └─Conv2d (1) [32, 12, 9, 9] [32, 8, 9, 9] 96 [1, 1] 248,832 β”‚ β”‚ β”‚ └─weight └─96 [12, 8, 1, 1] β”‚ β”‚ └─BatchNorm2d (2) [32, 8, 9, 9] [32, 8, 9, 9] 16 -- 512 β”‚ β”‚ β”‚ └─weight β”œβ”€8 [8] β”‚ β”‚ β”‚ └─bias └─8 [8] β”‚ └─Sequential (conv3) [32, 8, 9, 9] [32, 12, 9, 9] -- -- -- β”‚ β”‚ └─0.weight β”œβ”€720 [10, 8, 3, 3] β”‚ β”‚ └─1.weight β”œβ”€10 [10] β”‚ β”‚ └─1.bias β”œβ”€10 [10] β”‚ β”‚ └─4.weight β”œβ”€1,080 [12, 10, 3, 3] β”‚ β”‚ └─6.weight β”œβ”€12 [12] β”‚ β”‚ └─6.bias └─12 [12] β”‚ β”‚ └─Conv2d (0) [32, 8, 9, 9] [32, 10, 9, 9] 720 [3, 3] 1,866,240 β”‚ β”‚ β”‚ └─weight └─720 [8, 10, 3, 3] β”‚ β”‚ └─BatchNorm2d (1) [32, 10, 9, 9] [32, 10, 9, 9] 20 -- 640 β”‚ β”‚ β”‚ └─weight β”œβ”€10 [10] β”‚ β”‚ β”‚ └─bias └─10 [10] β”‚ β”‚ └─ReLU (2) [32, 10, 9, 9] [32, 10, 9, 9] -- -- -- β”‚ β”‚ └─Dropout2d (3) [32, 10, 9, 9] [32, 10, 9, 9] -- -- -- β”‚ β”‚ └─Conv2d (4) [32, 10, 9, 9] [32, 12, 9, 9] 1,080 [3, 3] 2,799,360 β”‚ β”‚ β”‚ └─weight └─1,080 [10, 12, 3, 3] β”‚ β”‚ └─ReLU (5) [32, 12, 9, 9] [32, 12, 9, 9] -- -- -- β”‚ β”‚ └─BatchNorm2d (6) [32, 12, 9, 9] [32, 12, 9, 9] 24 -- 768 β”‚ β”‚ β”‚ └─weight β”œβ”€12 [12] β”‚ β”‚ β”‚ └─bias └─12 [12] β”‚ β”‚ └─Dropout2d (7) [32, 12, 9, 9] [32, 12, 9, 9] -- -- -- β”‚ └─Sequential (trans3) [32, 12, 9, 9] [32, 10, 4, 4] -- -- -- β”‚ β”‚ └─0.weight β”œβ”€120 [10, 12, 1, 1] β”‚ β”‚ └─2.weight β”œβ”€10 [10] β”‚ β”‚ └─2.bias └─10 [10] β”‚ β”‚ └─Conv2d (0) [32, 12, 9, 9] [32, 10, 9, 9] 120 [1, 1] 311,040 β”‚ β”‚ β”‚ └─weight └─120 [12, 10, 1, 1] β”‚ β”‚ └─MaxPool2d (1) [32, 10, 9, 9] [32, 10, 4, 4] -- 2 -- β”‚ β”‚ └─BatchNorm2d (2) [32, 10, 4, 4] [32, 10, 4, 4] 20 -- 640 β”‚ β”‚ β”‚ └─weight β”œβ”€10 [10] β”‚ β”‚ β”‚ └─bias └─10 [10] β”‚ └─Sequential (out4) [32, 10, 4, 4] [32, 10, 1, 1] -- -- -- β”‚ β”‚ └─0.weight └─900 [10, 10, 3, 3] β”‚ β”‚ └─Conv2d (0) [32, 10, 4, 4] [32, 10, 4, 4] 900 [3, 3] 460,800 β”‚ β”‚ β”‚ └─weight └─900 [10, 10, 3, 3] β”‚ β”‚ └─AvgPool2d (1) [32, 10, 4, 4] [32, 10, 1, 1] -- 3 -- ======================================================================================================================== Total params: 7,988 Trainable params: 7,988 Non-trainable params: 0 Total mult-adds (Units.MEGABYTES): 77.51 ======================================================================================================================== Input size (MB): 0.10 Forward/backward pass size (MB): 18.40 Params size (MB): 0.03 Estimated Total Size (MB): 18.53 ======================================================================================================================== ``` ## Training Logs ###### TODO: Implementation ```sh cd /usr/home/:USER:/UnsolvedMNIST tensorboard --logdir=logs ``` ## Performance Profiling ###### TODO: Implementation ```log ------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg # of Calls ------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ aten::cudnn_convolution 69.86% 444.597ms 69.86% 444.597ms 37.050ms 12 aten::_log_softmax 7.52% 47.831ms 7.52% 47.831ms 47.831ms 1 aten::clamp_min 4.42% 28.104ms 4.42% 28.104ms 3.513ms 8 aten::cudnn_batch_norm 4.13% 26.264ms 4.20% 26.758ms 2.676ms 10 aten::add_ 3.47% 22.086ms 3.47% 22.086ms 2.209ms 10 aten::bernoulli_ 2.79% 17.777ms 2.79% 17.777ms 2.222ms 8 aten::div_ 2.53% 16.126ms 2.53% 16.126ms 2.016ms 8 aten::mul 2.29% 14.584ms 2.29% 14.584ms 1.823ms 8 aten::avg_pool2d 0.63% 4.009ms 0.63% 4.009ms 4.009ms 1 aten::max_pool2d_with_indices 0.54% 3.446ms 0.54% 3.446ms 1.149ms 3 aten::convolution 0.39% 2.469ms 70.31% 447.487ms 37.291ms 12 aten::relu 0.28% 1.804ms 4.70% 29.908ms 3.739ms 8 aten::_batch_norm_impl_index 0.22% 1.430ms 4.43% 28.188ms 2.819ms 10 aten::batch_norm 0.16% 1.006ms 4.59% 29.194ms 2.919ms 10 aten::empty 0.12% 757.000us 0.12% 757.000us 11.828us 64 aten::max_pool2d 0.12% 751.000us 0.66% 4.197ms 1.399ms 3 aten::log_softmax 0.10% 653.000us 7.62% 48.484ms 48.484ms 1 aten::conv2d 0.10% 636.000us 70.41% 448.123ms 37.344ms 12 aten::feature_dropout 0.08% 479.000us 7.71% 49.058ms 6.132ms 8 aten::copy_ 0.07% 447.000us 0.07% 447.000us 63.857us 7 aten::_convolution 0.07% 421.000us 69.92% 445.018ms 37.085ms 12 aten::to 0.05% 291.000us 0.13% 843.000us 7.205us 117 aten::zeros 0.04% 270.000us 0.08% 523.000us 87.167us 6 aten::empty_strided 0.01% 60.000us 0.01% 60.000us 8.571us 7 aten::_to_copy 0.01% 45.000us 0.09% 552.000us 78.857us 7 aten::view 0.01% 39.000us 0.01% 39.000us 3.545us 11 aten::empty_like 0.00% 31.000us 0.06% 384.000us 38.400us 10 aten::new_empty 0.00% 23.000us 0.01% 92.000us 11.500us 8 aten::_has_compatible_shallow_copy_type 0.00% 2.000us 0.00% 2.000us 0.031us 64 aten::zero_ 0.00% 1.000us 0.00% 1.000us 0.167us 6 ------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ Self CPU time total: 636.439ms ``` # Contribution ###### TODO: Implementation