File size: 1,498 Bytes
2d94717
d28d0c3
2d94717
 
 
 
 
bb701de
2d94717
bb701de
2d94717
bb701de
2d94717
bb701de
2d94717
bb701de
2d94717
bb701de
2d94717
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
0c3e2d8
 
2d94717
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
---
library_name: tf-keras
tags:
- image-classification
- Architecture
---

# Tensorflow Keras implementation of : [Image classification with ConvMixer](https://keras.io/examples/vision/convmixer/)

The full credit goes to: [Sayak Paul](https://twitter.com/RisingSayak)

## Short description:

ConvMixer is a simple model based on the ideas of representing an image as patches( used in ViT) and separating the mixing of Spatial and channel dimensions (used in MLP-Mixer). Unlike ViT and MLP-Mixer, they use only standard Convolution operations. The full paper is a submission to ICLR 22 and can be found [here](https://openreview.net/pdf?id=TVHS5Y4dNvM)

## Model and Dataset used

The Dataset used here is CIFAR-10. The model is called ConvMixer-256/8 where 256 is the hidden dimension (the dimension of patches) and 8 is the depth(number of repetitions of ConvMix layers)

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:

| Hyperparameters | Value |
| :-- | :-- |
| name | AdamW |
| learning_rate | 0.0010000000474974513 |
| decay | 0.0 |
| beta_1 | 0.8999999761581421 |
| beta_2 | 0.9990000128746033 |
| epsilon | 1e-07 |
| amsgrad | False |
| weight_decay | 9.999999747378752e-05 |
| exclude_from_weight_decay | None |
| training_precision | float32 |

## Training Metrics
After 10 Epocs, the test accuracy of the model is 83.57%

 ## Model Plot

<details>
<summary>View Model Plot</summary>

![Model Image](./model.png)

</details>