Duarte commited on
Commit
78d2da1
·
verified ·
1 Parent(s): b4cf0d3

Upload folder using huggingface_hub

Browse files
README.md CHANGED
@@ -1,3 +1,197 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Image Orientation Detector
2
+
3
+ This project implements a deep learning model to detect the orientation of images and determine the rotation needed to correct them. It uses a pre-trained EfficientNetV2 model from PyTorch, fine-tuned for the task of classifying images into four orientation categories: 0°, 90°, 180°, and 270°.
4
+
5
+ The model achieves **97.53% accuracy** on the validation set.
6
+
7
+ ## Training Performance and Model History
8
+
9
+ This model was trained on a single NVIDIA RTX 4080 GPU, taking approximately **3 hours and 20 minutes** to complete.
10
+
11
+ The final model is using `EfficientNetV2-S`, but the project evolved through several iterations:
12
+
13
+ - **ResNet18:** Achieved ~90% accuracy with a model size of around 30MB.
14
+ - **ResNet50:** Improved accuracy to 95.26% with a model size of ~100MB.
15
+ - **EfficientNetV2-S:** Reached the "final" (for now) accuracy of **97.53%** with ~80MB.
16
+
17
+ ## How It Works
18
+
19
+ The model is trained on a dataset of images, where each image is rotated by 0°, 90°, 180°, and 270°. The model learns to predict which rotation has been applied. The prediction can then be used to determine the correction needed to bring the image to its upright orientation.
20
+
21
+ The four classes correspond to the following rotations:
22
+
23
+ - **Class 0:** Image is correctly oriented (0°).
24
+ - **Class 1:** Image needs to be rotated 90° Counter-Clockwise to be correct.
25
+ - **Class 2:** Image needs to be rotated 180° to be correct.
26
+ - **Class 3:** Image needs to be rotated 90° Clockwise to be correct.
27
+
28
+ ## Dataset
29
+
30
+ The model was trained on several datasets:
31
+
32
+ - **Microsoft COCO Dataset:** A large-scale object detection, segmentation, and captioning dataset ([link](https://cocodataset.org/)).
33
+ - **AI-Generated vs. Real Images:** A dataset from Kaggle ([link](https://www.kaggle.com/datasets/cashbowman/ai-generated-images-vs-real-images)) was included to make the model aware of the typical orientations on different compositions found in art and illustrations.
34
+ - **Personal Images:** A small, curated collection of personal photographs to include unique examples and edge cases.
35
+
36
+ The combined dataset consists of **45,726** unique images. Each image is augmented by being rotated in four ways (0°, 90°, 180°, 270°), creating a total of **182,904** samples. This augmented dataset was then split into **146,323 samples for training** and **36,581 samples for validation**.
37
+
38
+ ## Project Structure
39
+
40
+ ```
41
+ image_orientation_detector/
42
+ ├───.gitignore
43
+ ├───config.py # Main configuration file for paths, model, and hyperparameters
44
+ ├───convert_to_onnx.py # Script to convert the PyTorch model to ONNX format
45
+ ├───predict.py # Script for running inference on new images
46
+ ├───README.md # This file
47
+ ├───requirements.txt # Python dependencies
48
+ ├───train.py # Main script for training the model
49
+ ├───data/
50
+ │ ├───upright_images/ # Directory for correctly oriented images
51
+ │ └───cache/ # Directory for cached, pre-rotated images (auto-generated)
52
+ ├───models/
53
+ │ └───best_model.pth # The best trained model weights
54
+ └───src/
55
+ ├───caching.py # Logic for creating the image cache
56
+ ├───dataset.py # PyTorch Dataset classes
57
+ ├───model.py # Model definition (EfficientNetV2)
58
+ └───utils.py # Utility functions (e.g., device setup, transforms)
59
+ ```
60
+
61
+ ## Usage
62
+
63
+ ### Getting Started
64
+
65
+ You can download the pre-trained model (`orientation_model_xx.pth`) and its ONNX version (`orientation_model_xx.onnx`) from the [GitHub Releases](https://github.com/your-repo/your-project/releases) page.
66
+
67
+ Install the required Python packages using the `requirements.txt` file:
68
+
69
+ ```bash
70
+ pip install -r requirements.txt
71
+ ```
72
+
73
+ ### Prediction
74
+
75
+ To predict the orientation of an image or a directory of images, there's a `predict.py` script.
76
+
77
+ - **Predict a single image:**
78
+
79
+ ```bash
80
+ python predict.py --input_path /path/to/image.jpg
81
+ ```
82
+ - **Predict all images in a directory:**
83
+
84
+ ```bash
85
+ python predict.py --input_path /path/to/directory/
86
+ ```
87
+
88
+ The script will output the predicted orientation for each image.
89
+
90
+ ### ONNX Export and Prediction
91
+
92
+ This project also includes exporting the trained PyTorch model to the ONNX (Open Neural Network Exchange) format. This allows for faster inference, especially on hardware that doesn't have PyTorch installed.
93
+
94
+ To convert a `.pth` model to `.onnx`, provide the path to the model file:
95
+
96
+ ```bash
97
+ python convert_to_onnx.py path/to/model.pth
98
+ ```
99
+
100
+ This will create a `model.onnx` file in the same directory.
101
+
102
+ To predict image orientation using the ONNX model:
103
+
104
+ - **Predict a single image:**
105
+
106
+ ```bash
107
+ python predict_onnx.py --input_path /path/to/image.jpg
108
+ ```
109
+ - **Predict all images in a directory:**
110
+
111
+ ```bash
112
+ python predict_onnx.py --input_path /path/to/directory/
113
+ ```
114
+
115
+ #### ONNX GPU Acceleration (Optional)
116
+
117
+ For even better performance on NVIDIA GPUs, you can install the GPU-enabled version of ONNX Runtime.
118
+
119
+ ```bash
120
+ pip install onnxruntime-gpu
121
+ ```
122
+
123
+ Make sure you have a compatible CUDA toolkit installed on your system. The `predict_onnx.py` script will automatically try to use the CUDA provider if it's available.
124
+
125
+ #### Performance Comparison (PyTorch vs. ONNX)
126
+
127
+ For a dataset of 5055 images, the performance on a RTX 4080 running in **single-thread** was:
128
+
129
+ - **PyTorch (`predict.py`):** 135.71 seconds
130
+ - **ONNX (`predict_onnx.py`):** 60.83 seconds
131
+
132
+ This demonstrates a significant performance gain of approximately **55.2%** when using the ONNX model for inference.
133
+
134
+ ### Training
135
+
136
+ This model learns to identify image orientation by training on a dataset of images that you provide. For the model to learn effectively, provide images that are correctly oriented.
137
+
138
+ **Place Images in the `data/upright_images` directory**: All images must be placed in the `data/upright_images` directory. The training script will automatically generate rotated versions (90°, 180°, 270°) of these images and cache them for efficient training.
139
+
140
+ The directory structure should look like this:
141
+
142
+ ```
143
+ data/
144
+ └───upright_images/
145
+ ├───image1.jpg
146
+ ├───image2.png
147
+ └───...
148
+ ```
149
+
150
+ ### Configure the Training
151
+
152
+ All training parameters are centralized in the `config.py` file. Before starting the training, review and adjust the settings to match the hardware and dataset.
153
+
154
+ Key configuration options in `config.py`:
155
+
156
+ - **Paths and Caching**:
157
+
158
+ - `TRAIN_IMAGES_PATH`: Path to upright images. Defaults to `data/upright_images`.
159
+ - `CACHE_PATH`: Directory where rotated images will be cached. Defaults to `data/cache`.
160
+ - `USE_CACHE`: Set to `True` to use the cache on subsequent runs, significantly speeding up data loading but takes a lot of disk space.
161
+ - **Model and Training Hyperparameters**:
162
+
163
+ - `MODEL_NAME`: The name of the model architecture to use (e.g., `EfficientNetV2S`).
164
+ - `IMAGE_SIZE`: The resolution to which images will be resized (e.g., `224` for 224x224 pixels).
165
+ - `BATCH_SIZE`: Number of images to process in each batch. Adjust based on GPU's VRAM.
166
+ - `NUM_EPOCHS`: The total number of times the model will iterate over the entire dataset.
167
+ - `LEARNING_RATE`: The initial learning rate for the optimizer.
168
+
169
+ ### Start Training
170
+
171
+ Once all data is in place and the configuration is set, start training the model by running the `train.py` script:
172
+
173
+ ```bash
174
+ python train.py
175
+ ```
176
+
177
+ - **First Run**: The first time the script runs, it will preprocess and cache the dataset. This may take a while depending on the size of the dataset.
178
+ - **Subsequent Runs**: Later runs will be much faster as they will use the cached data.
179
+ - **Monitoring**: Use TensorBoard to monitor training progress by running `tensorboard --logdir=runs`.
180
+
181
+ ### Monitoring with TensorBoard
182
+
183
+ The training script is integrated with TensorBoard to help visualize metrics and understand the model's performance. During training, logs are saved in the `runs/` directory.
184
+
185
+ To launch TensorBoard, run the command:
186
+
187
+ ```bash
188
+ tensorboard --logdir=runs
189
+ ```
190
+
191
+ This will start a web server, open the provided URL (usually `http://localhost:6006`) in the browser to view the dashboard.
192
+
193
+ In TensorBoard, you can track:
194
+
195
+ - **Accuracy:** `Accuracy/train` and `Accuracy/validation`
196
+ - **Loss:** `Loss/train` and `Loss/validation`
197
+ - **Learning Rate:** `Hyperparameters/learning_rate` to see how it changes over epochs.
orientation_model_v1_0.9753.onnx ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:68cb265f109726a907cf8c08b75e6aee2f22fd97655ce0ab9bd6200d3bceb317
3
+ size 80578424
orientation_model_v1_0.9753.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9d53e043564ab9a4c5882f1617afbaa20841b131032e122eb822eb275b9e12e4
3
+ size 81685326