ummtushar commited on
Commit
60b0ddc
·
verified ·
1 Parent(s): a56b30f

initial commit

Browse files
README.md CHANGED
@@ -1,3 +1,98 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # How to run the code
2
+ Here is a brief guide to run our repository:
3
+ 1. Create a virtual environment
4
+ 2. Run "pip install -r requirements.txt" in the terminal
5
+ 3. Run the `dc1\image_dataset.py` file to download the data.
6
+ 4. Run `dc1\main\py` to train and test the model
7
+
8
+ # How to change to a Selected Model
9
+ 1. In `main.py` you can select the model by commenting the models you are not interested in and uncommenting the one you would like to use.
10
+ Exception: For the binary model you would need to comment the following:
11
+ train_dataset = ImageDataset(Path("data/X_train.npy"), Path("data/Y_train.npy")) & test_dataset = ImageDataset(Path("data/X_test.npy"), Path("data/Y_test.npy"))
12
+ and uncomment the loading of the binary dataset:
13
+ train_dataset = ImageDatasetBINARY(Path("data/X_train.npy"), Path("data/Y_train.npy")) & test_dataset = ImageDatasetBINARY(Path("data/X_test.npy"), Path("data/Y_test.npy"))
14
+
15
+
16
+ # Data-Challenge-1-template-code
17
+ This repository contains the template code for the TU/e course JBG040 Data Challenge 1.
18
+ Please read this document carefully as it has been filled out with important information.
19
+
20
+
21
+
22
+
23
+ # Data-Challenge-1-template-code
24
+ This repository contains the template code for the TU/e course JBG040 Data Challenge 1.
25
+ Please read this document carefully as it has been filled out with important information.
26
+
27
+ ## Code structure
28
+ The template code is structured into multiple files, based on their functionality.
29
+ There are five `.py` files in total, each containing a different part of the code.
30
+ Feel free to create new files to explore the data or experiment with other ideas.
31
+
32
+ - To download the data: run the `ImageDataset.py` file. The script will create a directory `/data/` and download the training and test data with corresponding labels to this directory.
33
+ - You will only have to run this script once usually, at the beginning of your project.
34
+
35
+ - To run the whole training/evaluation pipeline: run `main.py`. This script is prepared to do the followings:
36
+ - Load your train and test data (Make sure its downloaded beforehand!)
37
+ - Initializes the neural network as defined in the `Net.py` file.
38
+ - Initialize loss functions and optimizers. If you want to change the loss function/optimizer, do it here.
39
+ - Define number of training epochs and batch size
40
+ - Check and enable GPU acceleration for training (if you have CUDA or Apple Silicon enabled device)
41
+ - Train the neural network and perform evaluation on test set at the end of each epoch.
42
+ - Provide plots about the training losses both during training in the command line and as a png (saved in the `/artifacts/` subdirectory)
43
+ - Finally, save your trained model's weights in the `/model_weights/` subdirectory so that you can reload them later.
44
+
45
+ In your project, you are free to modify any parts of this code based on your needs.
46
+ Note that the Neural Network structure is defined in the `Net.py` file, so if you want to modify the network itself, you can do so in that script.
47
+ The loss functions and optimizers are all defined in `main.py`.
48
+
49
+ ## GitHub setup instructions
50
+ 1. Click the green *<> Code* button at the upper right corner of the repositiory.
51
+ 2. Make sure that the tab *Local* is selected and click *Download ZIP*.
52
+ 3. Go to the GitHub homepage and create a new repository.
53
+ 4. Make sure that the repository is set to **private** and give it the name **JBG040-GroupXX**, where XX is your group number.
54
+ 5. Press *uploading an exisiting file* and upload the extracted files from Data-Challenge-1-template-main.zip to your repository. Note that for the initial commit you should commit directly to the main branch
55
+ 6. Invite your **group members, tutor and teachers** by going to *Settings > Collaborators > Add people*.
56
+ 7. Open PyCharm and make sure that your GitHub account is linked.*
57
+ 8. In the welcome screen of PyCharm, click *Get from VCs > GitHub* and select your repository and click on clone.
58
+ 9. After the repository is cloned, you can now create a virtual environment using the requirements.txt.
59
+
60
+ *For information on how to install PyCharm and link Github to your PyCharm, we refer to the additional resources page on Canvas.
61
+
62
+
63
+ ## Environment setup instructions
64
+ We recommend to set up a virtual Python environment to install the package and its dependencies. To install the package, we recommend to execute `pip install -r requirements.txt.` in the command line. This will install it in editable mode, meaning there is no need to reinstall after making changes. If you are using PyCharm, it should offer you the option to create a virtual environment from the requirements file on startup. Note that also in this case, it will still be necessary to run the pip command described above.
65
+
66
+ ## Submission instructions
67
+ After each sprint, you are expected to submit your code. This will **not** be done in Canvas, instead you will be creating a release of your current repository.
68
+ A release is essentially a snapshot of your repository taken at a specific time.
69
+ Your future modifications are not going to affect this release.
70
+ **Note that you are not allowed to update your old releases after the deadline.**
71
+ For more information on releases, see the [GitHub releases](https://docs.github.com/en/repositories/releasing-projects-on-github/about-releases) page.
72
+
73
+ 1. Make sure that your code is running without issues and that **everything is pushed to the main branch**.
74
+ 2. Head over to your repository and click on *Releases* (located at the right-hand side).
75
+ 3. Click on the green button *Create a new release*.*
76
+ 4. Click on *Choose a tag*.
77
+ 5. Fill in the textbox with **SprintX** where X is the current sprint number and press *Create new tag: SprintX*.
78
+ 6. Make sure that *Target: main* or *Target: master* (depending on your main/master branch) is selected, so that the code release will be based on your main branch.
79
+ 7. Fill in the title of the release with **Group XX Sprint X** where XX is your group number and X is the current sprint number.
80
+ 8. Click the *Publish release* button to create a release for your sprint.
81
+ 9. **Verify** that your release has been succesfully created by heading over to your repository and press the *Releases* button once again. There you should be able to see your newly created release.
82
+
83
+ *After the first release, you should click *Draft a new release* instead of *Create a new release*
84
+
85
+ ## Mypy
86
+ The template is created with support for full typehints. This enables the use of a powerful tool called `mypy`. Code with typehinting can be statically checked using this tool. It is recommended to use this tool as it can increase confidence in the correctness of the code before testing it. Note that usage of this tool and typehints in general is entirely up to the students and not enforced in any way. To execute the tool, simply run `mypy .`. For more information see https://mypy.readthedocs.io/en/latest/faq.html
87
+
88
+ ## Argparse
89
+ Argparse functionality is included in the main.py file. This means the file can be run from the command line while passing arguments to the main function. Right now, there are arguments included for the number of epochs (nb_epochs), batch size (batch_size), and whether to create balanced batches (balanced_batches). You are free to add or remove arguments as you see fit.
90
+
91
+ To make use of this functionality, first open the command prompt and change to the directory containing the main.py file.
92
+ For example, if you're main file is in C:\Data-Challenge-1-template-main\dc1\,
93
+ type `cd C:\Data-Challenge-1-template-main\dc1\` into the command prompt and press enter.
94
+
95
+ Then, main.py can be run by, for example, typing `python main.py --nb_epochs 10 --batch_size 25`.
96
+ This would run the script with 10 epochs, a batch size of 25, and balanced batches, which is also the current default.
97
+ If you would want to run the script with 20 epochs, a batch size of 5, and batches that are not balanced,
98
+ you would type `main.py --nb_epochs 20 --batch_size 5 --no-balanced_batches`.
dc1/.DS_Store ADDED
Binary file (8.2 kB). View file
 
dc1/__init__.py ADDED
@@ -0,0 +1 @@
 
 
1
+
dc1/__pycache__/__init__.cpython-39.pyc ADDED
Binary file (159 Bytes). View file
 
dc1/__pycache__/batch_sampler.cpython-39.pyc ADDED
Binary file (3.05 kB). View file
 
dc1/__pycache__/image_dataset.cpython-39.pyc ADDED
Binary file (3.83 kB). View file
 
dc1/__pycache__/image_dataset_BINARY.cpython-39.pyc ADDED
Binary file (4.02 kB). View file
 
dc1/__pycache__/main.cpython-39.pyc ADDED
Binary file (3.99 kB). View file
 
dc1/__pycache__/net.cpython-39.pyc ADDED
Binary file (3.59 kB). View file
 
dc1/__pycache__/net_BINARY.cpython-39.pyc ADDED
Binary file (1.38 kB). View file
 
dc1/__pycache__/train_test.cpython-39.pyc ADDED
Binary file (2.16 kB). View file
 
dc1/__pycache__/visualise_performance_metrics.cpython-39.pyc ADDED
Binary file (2.02 kB). View file
 
dc1/artifacts/artifacts/session_02_25_16_52.png ADDED
dc1/artifacts/session_030412_45.png ADDED
dc1/artifacts/session_030622_51.png ADDED
dc1/artifacts/session_031915_58.png ADDED
dc1/batch_sampler.py ADDED
@@ -0,0 +1,81 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import numpy as np
2
+ import random
3
+ import torch
4
+ from image_dataset import ImageDataset
5
+ from typing import Generator, Tuple
6
+
7
+
8
+ class BatchSampler:
9
+ """
10
+ Implements an iterable which given a torch dataset and a batch_size
11
+ will produce batches of data of that given size. The batches are
12
+ returned as tuples in the form (images, labels).
13
+ Can produce balanced batches, where each batch will have an equal
14
+ amount of samples from each class in the dataset. If your dataset is heavily
15
+
16
+ imbalanced, this might mean throwing away a lot of samples from
17
+ over-represented classes!
18
+ """
19
+
20
+ def __init__(self, batch_size: int, dataset: ImageDataset, balanced: bool = False) -> None:
21
+ self.batch_size = batch_size
22
+ self.dataset = dataset
23
+ self.balanced = balanced
24
+ if self.balanced:
25
+ # Counting the ocurrence of the class labels:
26
+ unique, counts = np.unique(self.dataset.targets, return_counts=True)
27
+ indexes = []
28
+ # Sampling an equal amount from each class:
29
+ for i in range(len(unique)):
30
+ print(i)
31
+ indexes.append(
32
+ np.random.choice(
33
+ np.where(self.dataset.targets == i)[0],
34
+ size=counts.min(),
35
+ replace=False,
36
+ )
37
+ )
38
+ # Setting the indexes we will sample from later:
39
+ self.indexes = np.concatenate(indexes)
40
+ else:
41
+ # Setting the indexes we will sample from later (all indexes):
42
+ self.indexes = [i for i in range(len(dataset))]
43
+
44
+ def __len__(self) -> int:
45
+ return (len(self.indexes) // self.batch_size) + 1
46
+
47
+ def shuffle(self) -> None:
48
+ random.shuffle(self.indexes)
49
+
50
+ def __iter__(self) -> Generator[Tuple[torch.Tensor, torch.Tensor], None, None]:
51
+ remaining = False
52
+ self.shuffle()
53
+ # Go over the datset in steps of 'self.batch_size':
54
+ for i in range(0, len(self.indexes), self.batch_size):
55
+ # If our current batch is larger than the remaining data, we quit:
56
+ if i + self.batch_size > len(self.indexes):
57
+ remaining = True
58
+ break
59
+ # If not, we yield a complete batch:
60
+ else:
61
+ # Getting a list of samples from the dataset, given the indexes we defined:
62
+ X_batch = [
63
+ self.dataset[self.indexes[k]][0]
64
+ for k in range(i, i + self.batch_size)
65
+ ]
66
+ Y_batch = [
67
+ self.dataset[self.indexes[k]][1]
68
+ for k in range(i, i + self.batch_size)
69
+ ]
70
+ # Stacking all the samples and returning the target labels as a tensor:
71
+ yield torch.stack(X_batch).float(), torch.tensor(Y_batch).long()
72
+ # If there is still data left that was not a full batch:
73
+ if remaining:
74
+ # Return the last batch (smaller than batch_size):
75
+ X_batch = [
76
+ self.dataset[self.indexes[k]][0] for k in range(i, len(self.indexes))
77
+ ]
78
+ Y_batch = [
79
+ self.dataset[self.indexes[k]][1] for k in range(i, len(self.indexes))
80
+ ]
81
+ yield torch.stack(X_batch).float(), torch.tensor(Y_batch).long()
dc1/confusion_matrix.png ADDED
dc1/data/X_train.npy ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a33632464abaa758eb0d3959466dd148972a75d1ca9afc5839335754aabc92c0
3
+ size 275923072
dc1/data/Y_test.npy ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:163a8d71c898f1556d0dd90ffbd44017f75fb5cc5de968066671f09aa92bc997
3
+ size 33808
dc1/data/Y_train.npy ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:3bbdfe89af848f932ea8fe330d29637a4a0e3f4d451e8ee05bd5e3107877dfb4
3
+ size 67492
dc1/eda.py ADDED
@@ -0,0 +1,161 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # # Imports
2
+ # import numpy as np
3
+ # import matplotlib.pyplot as plt
4
+ # import random
5
+
6
+ # maincolor = '#4a8cffff'
7
+ # secondcolor = '#e06666'
8
+
9
+ # NOTE: File used in the very beginning of the project. Please ignore!
10
+
11
+ # # Relative Path PUT YOUR PATHS HERE
12
+ # path = 'dc1/data/X_train.npy'
13
+
14
+ # data = np.load(path)
15
+
16
+
17
+ # # Display some images to see what are we working on
18
+ # def display_images(images, num_images=5):
19
+ # plt.figure(figsize=(15, 3))
20
+ # for i in range(num_images):
21
+ # plt.subplot(1, num_images, i + 1)
22
+ # plt.imshow(images[i].squeeze(), cmap='gray')
23
+ # plt.axis('off')
24
+ # plt.show()
25
+
26
+
27
+ # # function call
28
+ # display_images(data)
29
+
30
+
31
+ # ########################################################################################################################
32
+
33
+ # # # 1. Statistical Analysis
34
+ # # def compute_statistics(images):
35
+ # # # flatten the images since x/y are irrelevant
36
+ # # flattened_images = images.flatten()
37
+ # # mean_val = np.mean(flattened_images)
38
+ # # median_val = np.median(flattened_images)
39
+ # # std_dev_val = np.std(flattened_images)
40
+
41
+ # # return mean_val, median_val, std_dev_val
42
+
43
+
44
+ # # # Compute and print the statistics
45
+ # # mean_val, median_val, std_dev_val = compute_statistics(data)
46
+ # # print(f"Mean pixel intensity: {mean_val}")
47
+ # # print(f"Median pixel intensity: {median_val}")
48
+ # # print(f"Standard deviation of pixel intensities: {std_dev_val}")
49
+
50
+ # # # Global statistics
51
+ # # global_mean = np.mean(data)
52
+ # # global_std = np.std(data)
53
+ # # # Individual image statistics
54
+ # # image_means = np.mean(data, axis=(1, 2, 3))
55
+ # # image_stds = np.std(data, axis=(1, 2, 3))
56
+ # # # Outlier thresholds
57
+ # # upper_threshold = global_mean + 3 * global_std
58
+ # # lower_threshold = global_mean - 3 * global_std
59
+ # # outlier_indices = np.where((image_means > upper_threshold) | (image_means < lower_threshold))[0]
60
+ # # print(f"Found {len(outlier_indices)} potential outliers based on pixel intensity means.")
61
+
62
+
63
+ # # ########################################################################################################################
64
+
65
+ # # # 2. Histogram Analysis
66
+ # # def plot_histogram(images, title="Pixel Intensity Distribution"):
67
+ # # flattened_images = images.flatten()
68
+
69
+ # # # Customize plot aesthetics
70
+ # # plt.figure(figsize=(10, 6))
71
+ # # plt.hist(flattened_images, bins=256, range=(0, 255), color= maincolor, alpha=0.75)
72
+
73
+ # # # Adding grid, title, and labels with improved aesthetics
74
+ # # plt.grid(axis='y', alpha=0.75)
75
+ # # plt.title(title, fontsize=15, color='#333333')
76
+ # # plt.xlabel('Pixel Intensity', fontsize=12, color='#333333')
77
+ # # plt.ylabel('Frequency', fontsize=12, color='#333333')
78
+
79
+ # # # Customizing tick marks for better readability
80
+ # # plt.xticks(fontsize=10, color='#333333')
81
+ # # plt.yticks(fontsize=10, color='#333333')
82
+
83
+ # # # Adding a background color to the plot for contrast
84
+ # # ax = plt.gca() # Get current axes
85
+ # # ax.set_facecolor('#f0f0f0')
86
+ # # ax.figure.set_facecolor('#f8f8f8')
87
+
88
+ # # # Add a border around the plot for a more polished look
89
+ # # for spine in ax.spines.values():
90
+ # # spine.set_edgecolor('#d0d0d0')
91
+
92
+ # # plt.show()
93
+
94
+ # # # Plot histogram for the entire dataset
95
+ # # plot_histogram(data, title="Pixel Intensity Distribution Across Entire Dataset")
96
+ # # # Plot a selected image
97
+ # # plot_histogram(data[10], title="Pixel Intensity Distribution of a Selected Image")
98
+
99
+
100
+ # def plot_histogram_with_images(images, num_images=5):
101
+ # # Select a set of random images
102
+ # random_indices = random.sample(range(images.shape[0]), num_images)
103
+
104
+ # for index in random_indices:
105
+ # # Extract a single image
106
+ # single_xray_image = images[index]
107
+
108
+ # # Flatten the image for histogram
109
+ # flattened_image = single_xray_image.flatten()
110
+
111
+ # # Create a figure with 2 subplots
112
+ # fig, axs = plt.subplots(1, 2, figsize=(12, 6))
113
+
114
+ # # Plot histogram on the first subplot
115
+ # axs[0].hist(flattened_image, bins=256, range=(0, 255), color=maincolor, alpha=0.75)
116
+ # axs[0].set_title('Pixel Intensity Distribution')
117
+ # axs[0].set_xlabel('Pixel Intensity')
118
+ # axs[0].set_ylabel('Frequency')
119
+ # axs[0].set_ylim(0,600)
120
+ # axs[0].grid(True)
121
+
122
+ # # Show the image on the second subplot
123
+ # axs[1].imshow(single_xray_image.squeeze(), cmap='gray')
124
+ # axs[1].set_title('X-Ray Image')
125
+ # axs[1].axis('off')
126
+
127
+ # plt.tight_layout()
128
+ # plt.show()
129
+
130
+ # plot_histogram_with_images(data)
131
+
132
+
133
+ # # 3. Plot for Accuracy, Precision and # Recall
134
+ # def plot_metrics_evolution(epochs, accuracy, precision):
135
+ # import matplotlib.pyplot as plt
136
+
137
+ # plt.rcParams.update({'font.size': 12})
138
+
139
+ # plt.figure(figsize=(12, 8))
140
+
141
+ # plt.plot(epochs, accuracy, label='Accuracy', marker='o', linestyle='-', color=maincolor)
142
+ # plt.plot(epochs, precision, label='Precision', marker='s', linestyle='--', color=secondcolor)
143
+ # # plt.plot(epochs, recall, label='Recall', marker='^', linestyle='-.', color='red')
144
+
145
+ # plt.title('Model Performance Over 10 Epochs')
146
+ # plt.xlabel('Epoch')
147
+ # plt.ylabel('Score')
148
+ # plt.xticks(epochs)
149
+
150
+ # plt.legend()
151
+ # plt.grid(True)
152
+ # plt.show()
153
+
154
+ # # Data
155
+ # epochs = list(range(1, 11))
156
+ # accuracy = [0.1884, 0.1968, 0.1985, 0.2200, 0.2122, 0.2208, 0.2340, 0.2337, 0.2318, 0.2384]
157
+ # precision = [0.1664, 0.3518, 0.2644, 0.3144, 0.3137, 0.3212, 0.2983, 0.3108, 0.2635, 0.3081]
158
+ # # recall = [0.1884, 0.1968, 0.1985, 0.2200, 0.2122, 0.2208, 0.2340, 0.2337, 0.2318, 0.2384]
159
+
160
+ # # Example function call
161
+ # plot_metrics_evolution(epochs, accuracy, precision)
dc1/image_dataset.py ADDED
@@ -0,0 +1,114 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import numpy as np
2
+ import torch
3
+ import torchvision.transforms as T
4
+ import torchvision.transforms.functional as TF
5
+ import requests
6
+ import io
7
+ from os import path
8
+ from typing import Tuple, List
9
+ from pathlib import Path
10
+ import os
11
+
12
+
13
+ class ImageDataset:
14
+ """
15
+ Creates a DataSet from numpy arrays while keeping the data
16
+ in the more efficient numpy arrays for as long as possible and only
17
+ converting to torchtensors when needed (torch tensors are the objects used
18
+ to pass the data through the neural network and apply weights).
19
+ """
20
+
21
+ def __init__(self, x: Path, y: Path) -> None:
22
+ # Target labels
23
+ self.targets = ImageDataset.load_numpy_arr_from_npy(y)
24
+ # Images
25
+ self.imgs = ImageDataset.load_numpy_arr_from_npy(x)
26
+
27
+ def __len__(self) -> int:
28
+ return len(self.targets)
29
+
30
+ def __getitem__(self, idx: int) -> Tuple[torch.Tensor, np.ndarray]:
31
+ # Template code
32
+ image = torch.from_numpy(self.imgs[idx] / 255).float()
33
+ label = self.targets[idx]
34
+
35
+ # Preprocessing
36
+ # Metrics for Normalization of the images
37
+ mean = image.mean()
38
+ std = image.std()
39
+
40
+ # Compose: Composes several transforms together (torch documentation)
41
+ compose = T.Compose([
42
+ T.Normalize(mean, std), # Normalization
43
+ T.Resize(156), # Resizing to 156x156
44
+ T.CenterCrop(128), # Cropping to focus on the center 128x128 region
45
+ T.Lambda(lambda x: TF.rotate(x, angle=90)), # Rotating by 90 degrees
46
+ T.RandomHorizontalFlip(p=0.5), # Random horizontal flip with a 50% probability
47
+ T.RandomVerticalFlip(p=0.5), # Random vertical flip with a 50% probability
48
+ T.Lambda(lambda x: x + torch.randn_like(x) * 0.1) # Adding random noise
49
+ ])
50
+
51
+ # Apply the transformation done by composee
52
+ image = compose(image)
53
+
54
+ return image, label
55
+
56
+ def get_labels(self) -> List[np.ndarray]:
57
+ return self.targets.tolist()
58
+
59
+ @staticmethod
60
+ def load_numpy_arr_from_npy(path: Path) -> np.ndarray:
61
+ """
62
+ Loads a numpy array from local storage.
63
+
64
+ Input:
65
+ path: local path of file
66
+
67
+ Outputs:
68
+ dataset: numpy array with input features or labels
69
+ """
70
+
71
+ return np.load(path)
72
+
73
+
74
+ def load_numpy_arr_from_url(url: str) -> np.ndarray:
75
+ """
76
+ Loads a numpy array from surfdrive.
77
+
78
+ Input:
79
+ url: Download link of dataset
80
+
81
+ Outputs:
82
+ dataset: numpy array with input features or labels
83
+ """
84
+
85
+ response = requests.get(url)
86
+ response.raise_for_status()
87
+
88
+ return np.load(io.BytesIO(response.content))
89
+
90
+
91
+ if __name__ == "__main__":
92
+ cwd = os.getcwd()
93
+ if path.exists(path.join(cwd + "data/")):
94
+ print("Data directory exists, files may be overwritten!")
95
+ else:
96
+ os.mkdir(path.join(cwd, "data/"))
97
+ ### Load labels
98
+ train_y = load_numpy_arr_from_url(
99
+ url="https://surfdrive.surf.nl/files/index.php/s/i6MvQ8nqoiQ9Tci/download"
100
+ )
101
+ np.save("data/Y_train.npy", train_y)
102
+ test_y = load_numpy_arr_from_url(
103
+ url="https://surfdrive.surf.nl/files/index.php/s/wLXiOjVAW4AWlXY/download"
104
+ )
105
+ np.save("data/Y_test.npy", test_y)
106
+ ### Load data
107
+ train_x = load_numpy_arr_from_url(
108
+ url="https://surfdrive.surf.nl/files/index.php/s/4rwSf9SYO1ydGtK/download"
109
+ )
110
+ np.save("data/X_train.npy", train_x)
111
+ test_x = load_numpy_arr_from_url(
112
+ url="https://surfdrive.surf.nl/files/index.php/s/dvY2LpvFo6dHef0/download"
113
+ )
114
+ np.save("data/X_test.npy", test_x)
dc1/image_dataset_BINARY.py ADDED
@@ -0,0 +1,125 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import numpy as np
2
+ import torch
3
+ import torchvision.transforms as T
4
+ import torchvision.transforms.functional as TF
5
+ import requests
6
+ import io
7
+ from os import path
8
+ from typing import Tuple, List
9
+ from pathlib import Path
10
+ import os
11
+
12
+
13
+ class ImageDatasetBINARY:
14
+ """
15
+ Creates a DataSet from numpy arrays while keeping the data
16
+ in the more efficient numpy arrays for as long as possible and only
17
+ converting to torchtensors when needed (torch tensors are the objects used
18
+ to pass the data through the neural network and apply weights).
19
+ """
20
+
21
+ def __init__(self, x: Path, y: Path) -> None:
22
+ # Target labels
23
+ self.targets = ImageDatasetBINARY.load_numpy_arr_from_npy(y)
24
+ # Images
25
+ self.imgs = ImageDatasetBINARY.load_numpy_arr_from_npy(x)
26
+ # Division in:
27
+ # SICK = 1
28
+ self.targets[self.targets == 0] = 1 # Atelactasis to SICK
29
+ self.targets[self.targets == 1] = 1 # Effusion to SICK
30
+ self.targets[self.targets == 2] = 1 # Infiltration to SICK
31
+ self.targets[self.targets == 4] = 1 # Nodule to SICK
32
+ self.targets[self.targets == 5] = 1 # Pneumonia to SICK
33
+
34
+ # NON SICK = 0
35
+ self.targets[self.targets == 3] = 0 # No Finding to NON SICK
36
+
37
+
38
+
39
+ def __len__(self) -> int:
40
+ return len(self.targets)
41
+
42
+ def __getitem__(self, idx: int) -> Tuple[torch.Tensor, np.ndarray]:
43
+ # Template code
44
+ image = torch.from_numpy(self.imgs[idx] / 255).float()
45
+ label = self.targets[idx]
46
+
47
+ # Metrics for Normalization of the images
48
+ mean = image.mean()
49
+ std = image.std()
50
+
51
+ # Compose: Composes several transforms together (torch documentation)
52
+ compose = T.Compose([
53
+ T.Normalize(mean, std), # Normalization
54
+ T.Resize(156), # Resizing to 156x156
55
+ T.CenterCrop(128), # Cropping to focus on the center 128x128 region
56
+ T.Lambda(lambda x: TF.rotate(x, angle=90)), # Rotating by 90 degrees
57
+ T.RandomHorizontalFlip(p=0.5), # Random horizontal flip with a 50% probability
58
+ T.RandomVerticalFlip(p=0.5), # Random vertical flip with a 50% probability
59
+ T.Lambda(lambda x: x + torch.randn_like(x) * 0.1) # Adding random noise
60
+ ])
61
+
62
+ # Apply the transformation done by composee
63
+ image = compose(image)
64
+
65
+ return image, label
66
+
67
+ def get_labels(self) -> List[np.ndarray]:
68
+ return self.targets.tolist()
69
+
70
+ @staticmethod
71
+ def load_numpy_arr_from_npy(path: Path) -> np.ndarray:
72
+ """
73
+ Loads a numpy array from local storage.
74
+
75
+ Input:
76
+ path: local path of file
77
+
78
+ Outputs:
79
+ dataset: numpy array with input features or labels
80
+ """
81
+
82
+ return np.load(path)
83
+
84
+
85
+ def load_numpy_arr_from_url(url: str) -> np.ndarray:
86
+ """
87
+ Loads a numpy array from surfdrive.
88
+
89
+ Input:
90
+ url: Download link of dataset
91
+
92
+ Outputs:
93
+ dataset: numpy array with input features or labels
94
+ """
95
+
96
+ response = requests.get(url)
97
+ response.raise_for_status()
98
+
99
+ return np.load(io.BytesIO(response.content))
100
+
101
+
102
+ if __name__ == "__main__":
103
+ cwd = os.getcwd()
104
+ if path.exists(path.join(cwd + "data/")):
105
+ print("Data directory exists, files may be overwritten!")
106
+ else:
107
+ os.mkdir(path.join(cwd, "data/"))
108
+ ### Load labels
109
+ train_y = load_numpy_arr_from_url(
110
+ url="https://surfdrive.surf.nl/files/index.php/s/i6MvQ8nqoiQ9Tci/download"
111
+ )
112
+ np.save("data/Y_train.npy", train_y)
113
+ test_y = load_numpy_arr_from_url(
114
+ url="https://surfdrive.surf.nl/files/index.php/s/wLXiOjVAW4AWlXY/download"
115
+ )
116
+ np.save("data/Y_test.npy", test_y)
117
+ ### Load data
118
+ train_x = load_numpy_arr_from_url(
119
+ url="https://surfdrive.surf.nl/files/index.php/s/4rwSf9SYO1ydGtK/download"
120
+ )
121
+ np.save("data/X_train.npy", train_x)
122
+ test_x = load_numpy_arr_from_url(
123
+ url="https://surfdrive.surf.nl/files/index.php/s/dvY2LpvFo6dHef0/download"
124
+ )
125
+ np.save("data/X_test.npy", test_x)
dc1/main.py ADDED
@@ -0,0 +1,256 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Custom imports
2
+ from batch_sampler import BatchSampler
3
+ from image_dataset import ImageDataset
4
+ from net import Net, ResNetModel, EfficientNetModel, EfficientNetModel_b7
5
+ from train_test import train_model, test_model
6
+ from sklearn.metrics import confusion_matrix, ConfusionMatrixDisplay,classification_report
7
+ from visualise_performance_metrics import create_confusion_matrix, ROC_multiclass
8
+ from image_dataset_BINARY import ImageDatasetBINARY
9
+ from net_BINARY import Net_BINARY
10
+ # Torch imports
11
+ import torch
12
+ import torch.nn as nn
13
+ import torch.optim as optim
14
+ from torchsummary import summary
15
+
16
+ # Other imports
17
+ import matplotlib.pyplot as plt
18
+ from matplotlib.pyplot import figure
19
+ import os
20
+ import argparse
21
+ import plotext
22
+ from datetime import datetime
23
+ from pathlib import Path
24
+ from typing import List
25
+ from sklearn.metrics import accuracy_score, precision_score, recall_score, roc_curve
26
+ import numpy as np
27
+
28
+ def main(args: argparse.Namespace, activeloop: bool = True) -> None:
29
+ # NOTE: Uncomment the dataset you would like to use. For instance, if you would like to run the Binary Model,
30
+ # you would need to comment the ImageDataset Class lines, or viceversa if you are running the other models.
31
+ # Load the train and test data set
32
+ train_dataset = ImageDataset(Path('dc1/data/X_train.npy'), Path('dc1/data/Y_train.npy'))
33
+ test_dataset = ImageDataset(Path('dc1/data/X_test.npy'), Path('dc1/data/Y_test.npy'))
34
+
35
+ # Load the BINARY train and BINARY test data set
36
+ # train_dataset = ImageDatasetBINARY(Path('dc1/data/X_train.npy'), Path('dc1/data/Y_train.npy'))
37
+ # test_dataset = ImageDatasetBINARY(Path('dc1/data/X_test.npy'), Path('dc1/data/Y_test.npy'))
38
+
39
+ # Load the Neural Net.
40
+ # NOTE: set number of distinct labels here
41
+ # NOTE: uncomment when you need to use one of the models
42
+ # Improved Net
43
+ # model = Net(n_classes=6)
44
+ # ResNet Pre-Trained Model
45
+ # model = ResNetModel(n_classes=6)
46
+ # EfficientNet Model Pre-Trained == OUR SELECTED MODEL
47
+ model = EfficientNetModel(n_classes=6)
48
+ # Binary Model
49
+ # model = Net_BINARY(n_classes=2)
50
+
51
+ # Initialize optimizer(s) and loss function(s)
52
+ optimizer = optim.SGD(model.parameters(), lr=0.01, momentum=0.1)
53
+ optimizer = optim.SGD(model.parameters(), lr=0.01, momentum=0.1)
54
+ loss_function = nn.CrossEntropyLoss()
55
+
56
+ # fetch epoch and batch count from arguments
57
+ n_epochs = args.nb_epochs
58
+ batch_size = args.batch_size
59
+
60
+ # IMPORTANT! Set this to True to see actual errors regarding
61
+ # the structure of your model (GPU acceleration hides them)!
62
+ # Also make sure you set this to False again for actual model training
63
+ # as training your model with GPU-acceleration (CUDA/MPS) is much faster.
64
+ DEBUG = False
65
+
66
+ # fpr = {x:[] for x in range(6)}
67
+ # tpr = {x:[] for x in range(6)}
68
+ # auc = {}
69
+
70
+ # Moving our model to the right device (CUDA will speed training up significantly!)
71
+ if torch.cuda.is_available() and not DEBUG:
72
+ print("@@@ CUDA device found, enabling CUDA training...")
73
+ device = "cuda"
74
+ model.to(device)
75
+ # Creating a summary of our model and its layers:
76
+ summary(model, (1, 128, 128), device=device)
77
+ elif (
78
+ torch.backends.mps.is_available() and not DEBUG
79
+ ): # PyTorch supports Apple Silicon GPU's from version 1.12
80
+ print("@@@ Apple silicon device enabled, training with Metal backend...")
81
+ device = "mps"
82
+ model.to(device)
83
+ else:
84
+ print("@@@ No GPU boosting device found, training on CPU...")
85
+ device = "cpu"
86
+ # Creating a summary of our model and its layers:
87
+ summary(model, (1, 128, 128), device=device)
88
+
89
+ # Lets now train and test our model for multiple epochs:
90
+ train_sampler = BatchSampler(
91
+ batch_size=batch_size, dataset=train_dataset, balanced=args.balanced_batches
92
+ )
93
+ test_sampler = BatchSampler(
94
+ batch_size=100, dataset=test_dataset, balanced=args.balanced_batches
95
+ )
96
+
97
+ mean_losses_train: List[torch.Tensor] = []
98
+ mean_losses_test: List[torch.Tensor] = []
99
+
100
+ for e in range(n_epochs):
101
+ if activeloop:
102
+ # Training:
103
+ losses = train_model(model, train_sampler, optimizer, loss_function, device)
104
+ # Calculating and printing statistics:
105
+ mean_loss = sum(losses) / len(losses)
106
+ mean_losses_train.append(mean_loss)
107
+ print(f"\nEpoch {e + 1} training done, loss on train set: {mean_loss}\n")
108
+
109
+ # Testing:
110
+ # losses, y_pred_probs = test_model(model, test_sampler, loss_function, device)
111
+ fpr = {x:[] for x in range(6)}
112
+ tpr = {x:[] for x in range(6)}
113
+ auc = {}
114
+
115
+ # # Calculating and printing statistics:
116
+ losses, y_pred_probs = test_model(model, test_sampler, loss_function, device, fpr, tpr, auc)
117
+
118
+ # # Calculating and printing statistics:
119
+ mean_loss = sum(losses) / len(losses)
120
+ mean_losses_test.append(mean_loss)
121
+ print(f"\nEpoch {e + 1} testing done, loss on test set: {mean_loss}\n")
122
+
123
+ print(auc)
124
+
125
+ ### Plotting during training
126
+ plotext.clf()
127
+ plotext.scatter(mean_losses_train, label="train")
128
+ plotext.scatter(mean_losses_test, label="test")
129
+ plotext.title("Train and test loss")
130
+
131
+ plotext.xticks([i for i in range(len(mean_losses_train) + 1)])
132
+
133
+ plotext.show()
134
+
135
+
136
+ ##################################################################################################################
137
+ # R O C C U R V E S
138
+ # ##################################################################################################################
139
+ # NOTE: If you would like to run the ROC function for the Binary dataset, you would need to comment the following code
140
+ # and uncomment the # ROC CURVE FOR BINARY. Additionally, in train_test.py file, in order to make the Binary ROC curve
141
+ # you would need to comment some lines specified in the file. Please check the file.
142
+
143
+ # ROC CURVE FOR MULTICLASS
144
+ plt.figure(figsize=(8, 6))
145
+
146
+ colors = plt.cm.get_cmap('viridis', 6).colors
147
+ class_names = ['Class 0 (Atelactasis)','Class 1 (Effusion)', 'Class 2 (Infiltration)', 'Class 3 (No Finding)', 'Class 4 (Nodule)', 'Class 5 (Pneumonia)']
148
+
149
+ for i, color in zip(range(6), colors):
150
+ plt.plot(fpr[i], tpr[i], color=color, lw=2, label='{} (AUC = {:.2f})'.format(class_names[i], auc[i]))
151
+
152
+ plt.plot([0, 1], [0, 1], color='gray', lw=1, linestyle='--')
153
+ plt.xlim([0.0, 1.0])
154
+ plt.ylim([0.0, 1.05])
155
+ plt.xlabel('False Positive Rate')
156
+ plt.ylabel('True Positive Rate')
157
+ plt.title('ROC Curves for 6 Classes')
158
+ plt.legend(loc="lower right")
159
+ plt.show()
160
+
161
+ # ROC CURVE FOR BINARY
162
+ # for i in range(2):
163
+ # plt.plot(fpr[i], tpr[i], label=f'Class {i} (AUC = {auc[i]:.2f})')
164
+
165
+ # plt.plot([0, 1], [0, 1], 'k--', label='Random chance')
166
+ # plt.xlabel('False Positive Rate')
167
+ # plt.ylabel('True Positive Rate')
168
+ # plt.title('ROC Curves for 2 Classes (1=Sick, 0=Non-sick)')
169
+ # plt.legend(loc='lower right')
170
+ # plt.show()
171
+
172
+
173
+ # retrieve current time to label artifacts
174
+ now = datetime.now()
175
+ # check if model_weights/ subdir exists
176
+ if not Path("model_weights/").exists():
177
+ os.mkdir(Path("model_weights/"))
178
+ if not Path("model_weights/").exists():
179
+ os.mkdir(Path("model_weights/"))
180
+
181
+ # Saving the model
182
+ torch.save(model.state_dict(), f"model_weights/model_{now.month:02}{now.day:02}{now.hour}_{now.minute:02}.txt")
183
+ torch.save(model.state_dict(), f"model_weights/model_{now.month:02}{now.day:02}{now.hour}_{now.minute:02}.txt")
184
+
185
+ # Create plot of losses
186
+ figure(figsize=(9, 10), dpi=80)
187
+ fig, (ax1, ax2) = plt.subplots(2, sharex=True)
188
+
189
+ ax1.plot(range(1, 1 + n_epochs), [x.detach().cpu() for x in mean_losses_train], label="Train", color="blue")
190
+ ax2.plot(range(1, 1 + n_epochs), [x.detach().cpu() for x in mean_losses_test], label="Test", color="red")
191
+ fig.legend()
192
+
193
+
194
+ # Check if /artifacts/ subdir exists
195
+ if not Path("artifacts/").exists():
196
+ os.mkdir(Path("artifacts/"))
197
+ if not Path("artifacts/").exists():
198
+ os.mkdir(Path("artifacts/"))
199
+
200
+ # save plot of losses
201
+ fig.savefig(Path("artifacts") / f"session_{now.month:02}{now.day:02}{now.hour}_{now.minute:02}.png")
202
+
203
+ ##################################################################################################################
204
+ # C O N F U S I O N M A T R I X & C L A S S I F I C A T I O N R E P O R T
205
+ # ##################################################################################################################
206
+ true_labels = test_dataset.get_labels()
207
+
208
+ # Set the model to evaluation mode
209
+ model.eval()
210
+
211
+ predicted_labels = []
212
+ with torch.no_grad():
213
+ for inputs, _ in test_dataset:
214
+ inputs = inputs.unsqueeze(0).to(device)
215
+
216
+ outputs = model(inputs)
217
+
218
+ # Get predicted labels by getting max value (aka most likely)
219
+ _, predicted = torch.max(outputs, 1)
220
+ predicted_labels.extend(predicted.cpu().numpy())
221
+
222
+ # Calculate Confusion Matrix
223
+ conf_matrix = confusion_matrix(true_labels, predicted_labels)
224
+
225
+ print("Confusion Matrix:")
226
+ print(conf_matrix)
227
+ # plot the confusion matrix
228
+ # fig, ax = plt.subplots()
229
+ # ConfusionMatrixDisplay(confusion_matrix=conf_matrix).plot(ax=ax, cmap="Blues")
230
+ # plt.show()
231
+ # plt.savefig('confusion_matrix.png')
232
+ create_confusion_matrix(true_labels, predicted_labels)
233
+
234
+ # Classification report (accuracy, precision, f1 etc)
235
+ class_report = classification_report(true_labels, predicted_labels)
236
+ print("\nClassification Report:")
237
+ print(class_report)
238
+
239
+ fig.savefig(Path("artifacts") / f"session_{now.month:02}{now.day:02}{now.hour}_{now.minute:02}.png")
240
+
241
+
242
+ if __name__ == "__main__":
243
+ parser = argparse.ArgumentParser()
244
+
245
+ parser.add_argument(
246
+ "--nb_epochs", help="number of training iterations", default=1, type=int)
247
+ parser.add_argument("--batch_size", help="batch_size", default=25, type=int)
248
+ parser.add_argument(
249
+ "--balanced_batches",
250
+ help="whether to balance batches for class labels",
251
+ default=True,
252
+ type=bool,
253
+ )
254
+ args = parser.parse_args()
255
+
256
+ main(args)
dc1/model_weights/model_02_25_16_52.txt ADDED
Binary file (333 kB). View file
 
dc1/net.py ADDED
@@ -0,0 +1,113 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import torch
2
+ import torch.nn as nn
3
+ import torchvision.models as models
4
+
5
+ # Improved Net
6
+ class Net(nn.Module):
7
+ def __init__(self, n_classes: int) -> None:
8
+ super(Net, self).__init__()
9
+
10
+ self.cnn_layers = nn.Sequential(
11
+ nn.Conv2d(1, 64, kernel_size=3, stride=1, padding=1),
12
+ nn.BatchNorm2d(64),
13
+ nn.ReLU(inplace=True),
14
+ nn.AvgPool2d(kernel_size=2),
15
+ nn.Dropout(p=0.5),
16
+
17
+ nn.Conv2d(64, 32, kernel_size=3, stride=1, padding=1),
18
+ nn.BatchNorm2d(32),
19
+ nn.ReLU(inplace=True),
20
+ nn.AvgPool2d(kernel_size=2),
21
+ nn.Dropout(p=0.25),
22
+
23
+ nn.Conv2d(32, 16, kernel_size=3, stride=1, padding=1),
24
+ nn.BatchNorm2d(16),
25
+ nn.ReLU(inplace=True),
26
+ nn.AvgPool2d(kernel_size=2),
27
+ nn.Dropout(p=0.125),
28
+
29
+ nn.Conv2d(16, 8, kernel_size=3, stride=1, padding=1),
30
+ nn.BatchNorm2d(8),
31
+ nn.ReLU(inplace=True),
32
+ nn.AvgPool2d(kernel_size=2),
33
+ nn.Dropout(p=0.1),
34
+
35
+ nn.Conv2d(8, 4, kernel_size=3, stride=1, padding=1),
36
+ nn.BatchNorm2d(4),
37
+ nn.ReLU(inplace=True),
38
+ nn.MaxPool2d(kernel_size=2),
39
+ nn.Dropout(p=0.05),
40
+
41
+ # New layer
42
+ nn.Conv2d(4, 4, kernel_size=3, stride=1, padding=1),
43
+ nn.BatchNorm2d(4),
44
+ nn.ReLU(inplace=True),
45
+ nn.MaxPool2d(kernel_size=2),
46
+ nn.Dropout(p=0.05),
47
+ )
48
+
49
+ self.linear_layers = nn.Sequential(
50
+ nn.Linear(16, 256),
51
+ nn.Linear(256, n_classes)
52
+ )
53
+
54
+ # Defining the forward pass
55
+ def forward(self, x: torch.Tensor) -> torch.Tensor:
56
+ x = self.cnn_layers(x)
57
+ # After our convolutional layers which are 2D, we need to flatten our
58
+ # input to be 1 dimensional, as the linear layers require this.
59
+ x = x.view(x.size(0), -1)
60
+ x = self.linear_layers(x)
61
+ return x
62
+
63
+
64
+ # Implementing Pre-Trained ResNet as a class
65
+ class ResNetModel(nn.Module):
66
+ def __init__(self, n_classes: int, pretrained: bool = True):
67
+ # Loading a pre-trained ResNet model
68
+ super(ResNetModel, self).__init__()
69
+ self.resnet = models.resnet34(pretrained=pretrained)
70
+ self.resnet.conv1 = nn.Conv2d(1, 64, kernel_size=4, stride=(2, 2), padding=(3, 3), bias=False)
71
+ num_ftrs = self.resnet.fc.in_features
72
+ self.resnet.fc = nn.Linear(num_ftrs, n_classes)
73
+
74
+
75
+ def forward(self, x):
76
+ # Forward pass through the ResNet model
77
+ return self.resnet(x)
78
+
79
+ # Implementing the Efficient Net Model with efficientnet_b0
80
+ class EfficientNetModel(nn.Module):
81
+ def __init__(self, n_classes: int, version: str = 'b0', pretrained: bool = True):
82
+ super(EfficientNetModel, self).__init__()
83
+ # Loading a pretrained EfficientNet model
84
+ self.efficientnet = models.efficientnet_b0(pretrained=pretrained) if version == 'b0' else models.__dict__[f'efficientnet_{version}'](pretrained=pretrained)
85
+
86
+ # Adjusting the classifier to match the number of classes
87
+ num_ftrs = self.efficientnet.classifier[1].in_features
88
+ self.efficientnet.classifier[1] = nn.Linear(num_ftrs, n_classes)
89
+
90
+ def forward(self, x):
91
+ # Forward pass through the EfficientNet model
92
+ # Replicating the grayscale channel to have 3 channels
93
+ x = x.repeat(1, 3, 1, 1)
94
+ return self.efficientnet(x)
95
+
96
+ # Implementing the Efficient Net Model with efficientnet_b7
97
+ # NOTE: This model takes a lot of time to run
98
+ class EfficientNetModel_b7(nn.Module):
99
+ def __init__(self, n_classes: int, version: str = 'b0', pretrained: bool = True):
100
+ super(EfficientNetModel_b7, self).__init__()
101
+ # Loading a pretrained EfficientNet model
102
+ self.efficientnet = models.efficientnet_b7(pretrained=pretrained) if version == 'b7' else models.__dict__[f'efficientnet_{version}'](pretrained=pretrained)
103
+
104
+ # Adjusting the classifier to match the number of classes
105
+ num_ftrs = self.efficientnet.classifier[1].in_features
106
+ self.efficientnet.classifier[1] = nn.Linear(num_ftrs, n_classes)
107
+
108
+ def forward(self, x):
109
+ # Forward pass through the EfficientNet model
110
+ # Replicating the grayscale channel to have 3 channels
111
+ x = x.repeat(1, 3, 1, 1)
112
+ return self.efficientnet(x)
113
+
dc1/net_BINARY.py ADDED
@@ -0,0 +1,48 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import torch
2
+ import torch.nn as nn
3
+ import torchvision.models as models
4
+
5
+
6
+ # ORIGINAL ORGINAL NET (from template)
7
+ class Net_BINARY(nn.Module):
8
+ def __init__(self, n_classes: int) -> None:
9
+ super(Net_BINARY, self).__init__()
10
+
11
+ self.cnn_layers = nn.Sequential(
12
+ # Defining a 2D convolution layer
13
+ nn.Conv2d(1, 32, kernel_size=4, stride=1),
14
+ nn.PReLU(),
15
+ nn.BatchNorm2d(32),
16
+ nn.ReLU6(inplace=True),
17
+ nn.AvgPool2d(kernel_size=3),
18
+ torch.nn.Dropout(p=0.5, inplace=True),
19
+ # Defining another 2D convolution layer
20
+ nn.Conv2d(32, 64, kernel_size=4, stride=1),
21
+ nn.PReLU(),
22
+ nn.BatchNorm2d(64),
23
+ nn.ReLU6(inplace=True),
24
+ nn.AvgPool2d(kernel_size=3),
25
+ torch.nn.Dropout(p=0.25, inplace=True),
26
+ # Defining another 2D convolution layer
27
+ nn.Conv2d(64, 128, kernel_size=3, stride=1),
28
+ nn.PReLU(),
29
+ nn.BatchNorm2d(128),
30
+ nn.Sigmoid(),
31
+ nn.AvgPool2d(kernel_size=3),
32
+ torch.nn.Dropout(p=0.125, inplace=True),
33
+ )
34
+
35
+ self.linear_layers = nn.Sequential(
36
+ nn.Linear(1152, 312),
37
+ nn.Linear(312, n_classes)
38
+ )
39
+
40
+ # Defining the forward pass
41
+ def forward(self, x: torch.Tensor) -> torch.Tensor:
42
+ x = self.cnn_layers(x)
43
+ # After our convolutional layers which are 2D, we need to flatten our
44
+ # input to be 1 dimensional, as the linear layers require this.
45
+ x = x.view(x.size(0), -1)
46
+ x = self.linear_layers(x)
47
+ return x
48
+
dc1/perfomance_metrics.py ADDED
@@ -0,0 +1,148 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from pathlib import Path
2
+ import matplotlib.pyplot as plt
3
+ from sklearn import metrics
4
+ from sklearn.metrics import confusion_matrix, ConfusionMatrixDisplay, roc_curve, auc, classification_report, RocCurveDisplay
5
+ from sklearn.metrics import confusion_matrix, ConfusionMatrixDisplay
6
+ from train_test import test_model
7
+ import torch
8
+ from net import Net
9
+ from batch_sampler import BatchSampler
10
+ from image_dataset import ImageDataset
11
+ from sklearn.metrics import roc_curve, auc, RocCurveDisplay
12
+ from sklearn.preprocessing import label_binarize
13
+ from itertools import cycle
14
+ from scipy import interp
15
+ import numpy as np
16
+ from sklearn.metrics import roc_auc_score
17
+ from sklearn.preprocessing import LabelBinarizer
18
+ import seaborn as sns
19
+
20
+ # NOTE: File used in the beginning of the project. Please ignore!
21
+
22
+ def ConfusionMatrix(y_pred, y):
23
+ # Obtaining the predicted data
24
+ y_pred = y_pred.cpu()
25
+ y = y.cpu()
26
+ reshaped = y.reshape(-1)
27
+
28
+ # Plot Confusion Matrix
29
+ report = classification_report(y, y_pred, zero_division=1)
30
+ print(report)
31
+ conf = confusion_matrix(reshaped, y_pred)
32
+ disp = ConfusionMatrixDisplay(confusion_matrix=conf)
33
+
34
+ FP = conf.sum(axis=0) - np.diag(conf)
35
+ FN = conf.sum(axis=1) - np.diag(conf)
36
+ TP = np.diag(conf)
37
+ TN = conf.sum() - (FP + FN + TP)
38
+
39
+ return disp, FP, FN, TP, TN
40
+
41
+
42
+
43
+ def ROC(y_pred_prob, y_pred, y):
44
+ prob_reshape = y_pred_prob.cpu().reshape(-1)
45
+ y_pred = y_pred.cpu()
46
+ reshaped = y.cpu().reshape(-1)
47
+ y_pred_prob = y_pred_prob.cpu().numpy() # Convert to NumPy array
48
+ y_pred = y_pred.cpu().numpy() # Convert to NumPy array
49
+ y = y.cpu().numpy()
50
+
51
+
52
+
53
+ binary = []
54
+ for i in range(len(y_pred)):
55
+ if (y_pred[i] == reshaped[i]):
56
+ binary.append(1)
57
+ else:
58
+ binary.append(0)
59
+ fpr, tpr, threshold = metrics.roc_curve(binary, prob_reshape[:86])
60
+ roc_auc = metrics.auc(fpr, tpr)
61
+ disp_roc = metrics.RocCurveDisplay(fpr=fpr, tpr=tpr, roc_auc=roc_auc)
62
+
63
+ return disp_roc
64
+
65
+
66
+ # def ROC2(y_train, y_test, y_score):
67
+ # unique, counts = np.unique(np.concatenate((y_train, y_test)), return_counts=True)
68
+ # print(dict(zip(unique, counts)))
69
+
70
+ # label_binarizer = LabelBinarizer().fit(y_train)
71
+
72
+
73
+ # y_onehot_test = label_binarizer.transform(y_test)
74
+ # n_classes = len(label_binarizer.classes_)
75
+
76
+ # class_off_interest = 1
77
+ # class_id = np.flatnonzero(label_binarizer.classes_ == class_off_interest)[0]
78
+
79
+ # fig, ax = plt.subplots(figsize=(6, 6))
80
+ # target_names = ["Atelectasis", "Effusion", "Infiltration", "No Finding", "Nodule", "Pneumothorax"]
81
+ # colors = cycle(["purple", "darkorange", "cornflowerblue", "red", "green", "darkblue"])
82
+ # for class_id, color in zip(range(n_classes), colors):
83
+ # RocCurveDisplay.from_predictions(
84
+ # y_onehot_test[:, class_id],
85
+ # y_score[:, class_id],
86
+ # name=f"ROC curve for {target_names[class_id]}",
87
+ # color=color,
88
+ # ax=ax
89
+ # )
90
+
91
+ # plt.plot([0, 1], [0, 1], "k--", label="ROC curve for chance level (AUC = 0.5)")
92
+
93
+ # return fig
94
+
95
+ # def ROC2(y_true, y_pred_prob, n_classes):
96
+ # lb = LabelBinarizer()
97
+ # y_true_binarized = lb.fit_transform(y_true) # Binarize y_true
98
+ # print("Shape of y_pred_prob:", y_pred_prob.shape)
99
+
100
+
101
+ # fig, ax = plt.subplots(figsize=(8, 6)) # Prepare a figure for plotting
102
+
103
+ # # Iterate over each class to calculate ROC
104
+ # for i in range(n_classes):
105
+ # y_true_class = y_true_binarized[:, i] # True labels for class i
106
+ # y_pred_class = y_pred_prob[:, i] # Predicted probabilities for class i
107
+
108
+ # # Calculate ROC curve
109
+ # fpr, tpr, thresholds = roc_curve(y_true_class, y_pred_class)
110
+ # roc_auc = auc(fpr, tpr)
111
+
112
+ # # Plot ROC curve
113
+ # RocCurveDisplay(fpr=fpr, tpr=tpr, roc_auc=roc_auc).plot(ax=ax)
114
+
115
+ # plt.title("Multiclass ROC Curve")
116
+ # plt.show()
117
+
118
+ # return fig
119
+
120
+ def ROC_multiclass(y_true, y_pred_prob, n_classes):
121
+ # Binarize the output
122
+ y_true = label_binarize(y_true, classes=[*range(n_classes)])
123
+ fpr = dict()
124
+ tpr = dict()
125
+ roc_auc = dict()
126
+
127
+ for i in range(n_classes):
128
+ fpr[i], tpr[i], _ = roc_curve(y_true[:, i], y_pred_prob[:, i])
129
+ roc_auc[i] = auc(fpr[i], tpr[i])
130
+
131
+ # Plot ROC curves
132
+ plt.figure()
133
+ colors = cycle(['blue', 'red', 'green', 'yellow', 'orange', 'purple'])
134
+ for i, color in zip(range(n_classes), colors):
135
+ plt.plot(fpr[i], tpr[i], color=color, lw=2,
136
+ label='ROC curve of class {0} (area = {1:0.2f})'
137
+ ''.format(i, roc_auc[i]))
138
+
139
+ plt.plot([0, 1], [0, 1], 'k--', lw=2)
140
+ plt.xlim([0.0, 1.0])
141
+ plt.ylim([0.0, 1.05])
142
+ plt.xlabel('False Positive Rate')
143
+ plt.ylabel('True Positive Rate')
144
+ plt.title('Multiclass ROC')
145
+ plt.legend(loc="lower right")
146
+ plt.show()
147
+
148
+ return plt
dc1/processing.py ADDED
@@ -0,0 +1,402 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # # # # Imports
2
+ # # # import torch
3
+ # # # import numpy as np
4
+ # # # import pandas as pd
5
+ # # # import matplotlib.pyplot as plt
6
+ # # # import seaborn as sns
7
+ # # # # Imports
8
+ # # # import torch
9
+ # # # import numpy as np
10
+ # # # import pandas as pd
11
+ # # # import matplotlib.pyplot as plt
12
+ # # # import seaborn as sns
13
+
14
+ # # # from sklearn.metrics import confusion_matrix, roc_curve, auc
15
+ # # # from typing import Callable, List, Tuple
16
+ # # # import torch.nn as nn
17
+ # # # from pathlib import Path
18
+ # # # import torch.nn.functional as F
19
+ # # # from yaml import FlowSequenceStartToken
20
+ # # # from sklearn.metrics import confusion_matrix, roc_curve, auc
21
+ # # # from typing import Callable, List, Tuple
22
+ # # # import torch.nn as nn
23
+ # # # from pathlib import Path
24
+ # # # import torch.nn.functional as F
25
+ # # # from yaml import FlowSequenceStartToken
26
+
27
+ # # Import files
28
+ # from image_dataset import ImageDataset
29
+ # from net import Net, ResNetModel, EfficientNetModel
30
+ # from train_test import train_model, test_model
31
+ # from batch_sampler import BatchSampler
32
+
33
+ # NOTE: File used in the very beginning of the project. Please ignore!
34
+
35
+ # maincolor = '#4a8cffff'
36
+ # secondcolor = '#e06666'
37
+
38
+ # # Train data
39
+ # labels_train_path = 'dc1/data/Y_train.npy'
40
+ # data_train_path = 'dc1/data/X_train.npy'
41
+ # # Test data
42
+ # labels_test_path = 'dc1/data/Y_test.npy'
43
+ # data_test_path = 'dc1/data/X_test.npy'
44
+
45
+
46
+ # y_train = np.load(labels_train_path)
47
+ # unique_labels = np.unique(y_train)
48
+ # data_train = np.load(data_train_path)
49
+
50
+
51
+ # # Data Verification to check if we all have everything good
52
+ # data_shape = data_train.shape
53
+ # data_type = data_train.dtype
54
+ # labels_shape = y_train.shape
55
+ # labels_type = y_train.dtype
56
+ # print(f"Data Shape: {data_shape}, Data Type: {data_type}")
57
+ # print(f"Labels Shape: {labels_shape}, Labels Type: {labels_type}")
58
+
59
+ # # Check the range and distribution of features
60
+ # data_range = (np.min(data_train), np.max(data_train))
61
+
62
+ # # Label Encoding in accordance to the diseases
63
+ # class_names_mapping = {
64
+ # 0: 'Atelectasis',
65
+ # 1: 'Effusion',
66
+ # 2: 'Infiltration',
67
+ # 3: 'No Finding',
68
+ # 4: 'Nodule',
69
+ # 5: 'Pneumonia'
70
+ # }
71
+
72
+ # print("Unique classes in the training set:")
73
+ # for class_id in unique_labels:
74
+ # print(f"Class ID {class_id}: {class_names_mapping[class_id]}")
75
+
76
+ # # df for distribution analysis
77
+ # df_data_range = pd.DataFrame(data_train.reshape(data_train.shape[0], -1))
78
+
79
+ # ###################################################################
80
+ # ########### A D V A N C E D A N L Y S I S ###########
81
+ # ##################################################################
82
+
83
+ # # Y test data (labels)
84
+ # y_test = np.load(labels_test_path)
85
+
86
+ # # Initialize model (NET)
87
+ # n_classes = 6
88
+ # # NOTE : change the nn here!
89
+ # model = Net(n_classes=n_classes)
90
+ # # model = ResNetModel(n_classes=n_classes)
91
+ # # model = EfficientNetModel(n_classes=n_classes)
92
+
93
+ # # Device for test_model function call
94
+ # device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
95
+ # model.to(device)
96
+
97
+ # # Initialize the loss function
98
+ # loss_function = nn.CrossEntropyLoss() # we can use another, this one i found in internet but I was getting errors...
99
+
100
+
101
+ # # # Data Verification to check if we all have everything good
102
+ # # data_shape = data_train.shape
103
+ # # data_type = data_train.dtype
104
+ # # labels_shape = y_train.shape
105
+ # # labels_type = y_train.dtype
106
+ # # print(f"Data Shape: {data_shape}, Data Type: {data_type}")
107
+ # # print(f"Labels Shape: {labels_shape}, Labels Type: {labels_type}")
108
+
109
+ # # # Check the range and distribution of features
110
+ # # data_range = (np.min(data_train), np.max(data_train))
111
+
112
+ # # # Label Encoding in accordance to the diseases
113
+ # # class_names_mapping = {
114
+ # # 0: 'Atelectasis',
115
+ # # 1: 'Effusion',
116
+ # # 2: 'Infiltration',
117
+ # # 3: 'No Finding',
118
+ # # 4: 'Nodule',
119
+ # # 5: 'Pneumonia'
120
+ # # }
121
+
122
+ # # print("Unique classes in the training set:")
123
+ # # for class_id in unique_labels:
124
+ # # print(f"Class ID {class_id}: {class_names_mapping[class_id]}")
125
+
126
+ # # # df for distribution analysis
127
+ # # df_data_range = pd.DataFrame(data_train.reshape(data_train.shape[0], -1))
128
+
129
+ # # ###################################################################
130
+ # # ########### A D V A N C E D A N L Y S I S ###########
131
+ # # ##################################################################
132
+
133
+ # # # Y test data (labels)
134
+ # # y_test = np.load(labels_test_path)
135
+
136
+ # # # Initialize model (NET)
137
+ # # n_classes = 6
138
+ # # # NOTE : change the nn here!
139
+ # # model = Net(n_classes=n_classes)
140
+ # # # model = ResNetModel(n_classes=n_classes)
141
+ # # # model = EfficientNetModel(n_classes=n_classes)
142
+
143
+ # # # Device for test_model function call
144
+ # # device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
145
+ # # model.to(device)
146
+
147
+ # # # Initialize the loss function
148
+ # # loss_function = nn.CrossEntropyLoss() # we can use another, this one i found in internet but I was getting errors...
149
+
150
+ # # # Load test dataset w function
151
+ # # test_dataset = ImageDataset(Path("dc1/data/X_test.npy"), Path("dc1/data/Y_test.npy"))
152
+
153
+ # # # Initialize the BatchSampler
154
+ # # batch_size = 32
155
+ # # test_loader = BatchSampler(batch_size=batch_size, dataset=test_dataset, balanced=False) # 'balanced' or not we can choose depending on what we want
156
+
157
+ # # # Function call
158
+ # # losses, predicted_labels, true_labels, probabilities = test_model(model, test_loader, loss_function, device)
159
+
160
+ # ##################### R O C C U R V E #####################
161
+ # def plot_multiclass_roc_curve(y_true, y_scores, num_classes):
162
+ # # Compute ROC curve and ROC area for each class
163
+ # fpr = dict()
164
+ # tpr = dict()
165
+ # roc_auc = dict()
166
+
167
+ # for i in range(num_classes):
168
+ # fpr[i], tpr[i], _ = roc_curve(y_true[:, i], y_scores[:, i])
169
+ # roc_auc[i] = auc(fpr[i], tpr[i])
170
+
171
+ # # Plot all ROC curves
172
+ # plt.figure()
173
+ # for i in range(num_classes):
174
+ # plt.plot(fpr[i], tpr[i], label=f'ROC curve of class {i} (area = {roc_auc[i]:.2f})')
175
+
176
+ # plt.plot([0, 1], [0, 1], 'k--')
177
+ # plt.xlim([0.0, 1.0])
178
+ # plt.ylim([0.0, 1.05])
179
+ # plt.xlabel('False Positive Rate')
180
+ # plt.ylabel('True Positive Rate')
181
+ # plt.title('Multiclass ROC Curve')
182
+ # plt.legend(loc="lower right")
183
+ # plt.show()
184
+
185
+ # # Calculate the probabilities for each class
186
+ # model_predictions = []
187
+ # model_probabilities = []
188
+ # model_probabilities = F.softmax(torch.tensor(model_predictions), dim=0).numpy()
189
+
190
+ # plot_multiclass_roc_curve(y_test_binarized, model_probabilities, n_classes)
191
+
192
+ # model.eval() # Set the model to evaluation mode
193
+ # with torch.no_grad(): # Turn off gradients for the following block
194
+ # for data, target in test_loader:
195
+ # data, target = data.to(device), target.to(device)
196
+ # output = model(data)
197
+
198
+ # # Get class predictions
199
+ # _, preds = torch.max(output, 1)
200
+ # model_predictions.extend(preds.cpu().numpy())
201
+
202
+ # # Get probabilities for the positive class
203
+ # probs = F.softmax(output, dim=1)[:, 1] # Adjust the index based on your positive class
204
+ # model_probabilities.extend(probs.cpu().numpy())
205
+
206
+ # # # Specificity = Number of true negatives (Number of true negatives + number of false positives) =
207
+ # # # = Total number of individuals without the illness
208
+
209
+ # # def sensitivity_specificity(conf_matrix):
210
+ # # num_classes = conf_matrix.shape[0]
211
+ # # sensitivity = np.zeros(num_classes)
212
+ # # specificity = np.zeros(num_classes)
213
+
214
+ # # for i in range(num_classes):
215
+ # # TP = conf_matrix[i, i]
216
+ # # FN = sum(conf_matrix[i, :]) - TP
217
+ # # FP = sum(conf_matrix[:, i]) - TP
218
+ # # TN = conf_matrix.sum() - (TP + FP + FN)
219
+
220
+ # # sensitivity[i] = TP / (TP + FN) if (TP + FN) != 0 else 0
221
+ # # specificity[i] = TN / (TN + FP) if (TN + FP) != 0 else 0
222
+
223
+ # # return sensitivity, specificity
224
+
225
+ # # from sklearn.preprocessing import label_binarize
226
+
227
+ # # # Binarize the labels for multiclass (suggestion of LLM)
228
+ # # y_test_binarized = label_binarize(y_test, classes=np.unique(y_test))
229
+
230
+ # # ##################### R O C C U R V E #####################
231
+ # # def plot_multiclass_roc_curve(y_true, y_scores, num_classes):
232
+ # # # Compute ROC curve and ROC area for each class
233
+ # # fpr = dict()
234
+ # # tpr = dict()
235
+ # # roc_auc = dict()
236
+
237
+ # # for i in range(num_classes):
238
+ # # fpr[i], tpr[i], _ = roc_curve(y_true[:, i], y_scores[:, i])
239
+ # # roc_auc[i] = auc(fpr[i], tpr[i])
240
+
241
+ # # # Plot all ROC curves
242
+ # # plt.figure()
243
+ # # for i in range(num_classes):
244
+ # # plt.plot(fpr[i], tpr[i], label=f'ROC curve of class {i} (area = {roc_auc[i]:.2f})')
245
+
246
+ # # plt.plot([0, 1], [0, 1], 'k--')
247
+ # # plt.xlim([0.0, 1.0])
248
+ # # plt.ylim([0.0, 1.05])
249
+ # # plt.xlabel('False Positive Rate')
250
+ # # plt.ylabel('True Positive Rate')
251
+ # # plt.title('Multiclass ROC Curve')
252
+ # # plt.legend(loc="lower right")
253
+ # # plt.show()
254
+
255
+ # # # Calculate the probabilities for each class
256
+ # # model_predictions = []
257
+ # # model_probabilities = []
258
+ # # model_probabilities = F.softmax(torch.tensor(model_predictions), dim=0).numpy()
259
+
260
+ # # plot_multiclass_roc_curve(y_test_binarized, model_probabilities, n_classes)
261
+
262
+ # # model.eval() # Set the model to evaluation mode
263
+ # # with torch.no_grad(): # Turn off gradients for the following block
264
+ # # for data, target in test_loader:
265
+ # # data, target = data.to(device), target.to(device)
266
+ # # output = model(data)
267
+
268
+ # # # Get class predictions
269
+ # # _, preds = torch.max(output, 1)
270
+ # # model_predictions.extend(preds.cpu().numpy())
271
+
272
+ # # # Get probabilities for the positive class
273
+ # # probs = F.softmax(output, dim=1)[:, 1] # Adjust the index based on your positive class
274
+ # # model_probabilities.extend(probs.cpu().numpy())
275
+
276
+
277
+ # # # Calculate sensitivity and specificity
278
+ # # sensitivity, specificity = sensitivity_specificity(y_test, model_predictions)
279
+ # # print(f"Sensitivity: {sensitivity}")
280
+ # # print(f"Specificity: {specificity}")
281
+
282
+
283
+ # # ##################################################################################################################################################################
284
+
285
+ # # # # Display the images, 1 for each class
286
+ # # # def display_images(images, titles, num_images):
287
+ # # # plt.figure(figsize=(15, 5))
288
+ # # # for i in range(num_images):
289
+ # # # image = np.squeeze(images[i]) # squeeze to make it easy to ptint in 2d
290
+ # # # plt.subplot(1, num_images, i + 1)
291
+ # # # plt.imshow(image, cmap='gray')
292
+ # # # plt.title(titles[i])
293
+ # # # plt.axis('off')
294
+ # # # plt.show()
295
+
296
+ # # >>>>>>> ab59272 (Net / ResNet / EfficientNet Experiments)
297
+ # # # data_train = np.load(data_train_path)
298
+
299
+
300
+ # # # # Data Verification to check if we all have everything good
301
+ # # # data_shape = data_train.shape
302
+ # # # data_type = data_train.dtype
303
+ # # # labels_shape = y_train.shape
304
+ # # # labels_type = y_train.dtype
305
+ # # # print(f"Data Shape: {data_shape}, Data Type: {data_type}")
306
+ # # # print(f"Labels Shape: {labels_shape}, Labels Type: {labels_type}")
307
+
308
+ # # # # Check the range and distribution of features
309
+ # # # data_range = (np.min(data_train), np.max(data_train))
310
+
311
+ # # # # Label Encoding in accordance to the diseases
312
+ # # # class_names_mapping = {
313
+ # # # 0: 'Atelectasis',
314
+ # # # 1: 'Effusion',
315
+ # # # 2: 'Infiltration',
316
+ # # # 3: 'No Finding',
317
+ # # # 4: 'Nodule',
318
+ # # # 5: 'Pneumonia'
319
+ # # # }
320
+
321
+ # # # print("Unique classes in the training set:")
322
+ # # # for class_id in unique_labels:
323
+ # # # print(f"Class ID {class_id}: {class_names_mapping[class_id]}")
324
+
325
+ # # # # df for distribution analysis
326
+ # # # df_data_range = pd.DataFrame(data_train.reshape(data_train.shape[0], -1))
327
+
328
+
329
+ # # # Calculate the probabilities for each class
330
+ # # model_predictions = []
331
+ # # model_probabilities = []
332
+ # # model_probabilities = F.softmax(torch.tensor(model_predictions), dim=0).numpy()
333
+
334
+ # # plot_multiclass_roc_curve(y_test_binarized, model_probabilities, n_classes)
335
+
336
+ # # model.eval() # Set the model to evaluation mode
337
+ # # with torch.no_grad(): # Turn off gradients for the following block
338
+ # # for data, target in test_loader:
339
+ # # data, target = data.to(device), target.to(device)
340
+ # # output = model(data)
341
+
342
+ # # # Get class predictions
343
+ # # _, preds = torch.max(output, 1)
344
+ # # model_predictions.extend(preds.cpu().numpy())
345
+
346
+ # # # Get probabilities for the positive class
347
+ # # probs = F.softmax(output, dim=1)[:, 1] # Adjust the index based on your positive class
348
+ # # model_probabilities.extend(probs.cpu().numpy())
349
+
350
+
351
+ # # # Calculate sensitivity and specificity
352
+ # # sensitivity, specificity = sensitivity_specificity(y_test, model_predictions)
353
+ # # print(f"Sensitivity: {sensitivity}")
354
+ # # print(f"Specificity: {specificity}")
355
+
356
+
357
+ # # ##################################################################################################################################################################
358
+
359
+ # # # # Display the images, 1 for each class
360
+ # # # def display_images(images, titles, num_images):
361
+ # # # plt.figure(figsize=(15, 5))
362
+ # # # for i in range(num_images):
363
+ # # # image = np.squeeze(images[i]) # squeeze to make it easy to ptint in 2d
364
+ # # # plt.subplot(1, num_images, i + 1)
365
+ # # # plt.imshow(image, cmap='gray')
366
+ # # # plt.title(titles[i])
367
+ # # # plt.axis('off')
368
+ # # # plt.show()
369
+
370
+ # # >>>>>>> ab59272 (Net / ResNet / EfficientNet Experiments)
371
+ # # # data_train = np.load(data_train_path)
372
+
373
+
374
+ # # # # Data Verification to check if we all have everything good
375
+ # # # data_shape = data_train.shape
376
+ # # # data_type = data_train.dtype
377
+ # # # labels_shape = y_train.shape
378
+ # # # labels_type = y_train.dtype
379
+ # # # print(f"Data Shape: {data_shape}, Data Type: {data_type}")
380
+ # # # print(f"Labels Shape: {labels_shape}, Labels Type: {labels_type}")
381
+
382
+ # # # # Check the range and distribution of features
383
+ # # # data_range = (np.min(data_train), np.max(data_train))
384
+
385
+ # # # # Label Encoding in accordance to the diseases
386
+ # # # class_names_mapping = {
387
+ # # # 0: 'Atelectasis',
388
+ # # # 1: 'Effusion',
389
+ # # # 2: 'Infiltration',
390
+ # # # 3: 'No Finding',
391
+ # # # 4: 'Nodule',
392
+ # # # 5: 'Pneumonia'
393
+ # # # }
394
+
395
+ # # # print("Unique classes in the training set:")
396
+ # # # for class_id in unique_labels:
397
+ # # # print(f"Class ID {class_id}: {class_names_mapping[class_id]}")
398
+
399
+ # # # # df for distribution analysis
400
+ # # # df_data_range = pd.DataFrame(data_train.reshape(data_train.shape[0], -1))
401
+
402
+
dc1/py.typed ADDED
File without changes
dc1/train_test.py ADDED
@@ -0,0 +1,102 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from tqdm import tqdm
2
+ import torch
3
+ from net import Net
4
+ from batch_sampler import BatchSampler
5
+ from torch.nn import functional as F
6
+ import numpy as np
7
+ from net import Net
8
+ from batch_sampler import BatchSampler
9
+ from torch.nn import functional as F
10
+ import numpy as np
11
+ import torch.nn as nn
12
+
13
+ from net import Net, ResNetModel, EfficientNetModel, EfficientNetModel_b7
14
+ from batch_sampler import BatchSampler
15
+ from image_dataset import ImageDataset
16
+
17
+ from typing import Callable, List, Tuple
18
+
19
+ from sklearn.metrics import roc_curve, auc
20
+ from sklearn.preprocessing import label_binarize
21
+ import matplotlib.pyplot as plt
22
+
23
+
24
+
25
+ def train_model(
26
+ # model: Net, ## CHANGE NN HERE !
27
+ model: Net,
28
+ train_sampler: BatchSampler,
29
+ optimizer: torch.optim.Optimizer,
30
+ loss_function: Callable[..., torch.Tensor],
31
+ device: str,
32
+ ) -> List[torch.Tensor]:
33
+ # Lets keep track of all the losses:
34
+ losses = []
35
+ # Put the model in train mode:
36
+ model.train()
37
+ # Feed all the batches one by one:
38
+ for batch in tqdm(train_sampler):
39
+ # Get a batch:
40
+ x, y = batch
41
+ # Making sure our samples are stored on the same device as our model:
42
+ x, y = x.to(device), y.to(device)
43
+ # Get predictions:
44
+ predictions = model.forward(x)
45
+ loss = loss_function(predictions, y)
46
+ losses.append(loss)
47
+ # We first need to make sure we reset our optimizer at the start.
48
+ # We want to learn from each batch seperately,
49
+ # not from the entire dataset at once.
50
+ optimizer.zero_grad()
51
+ # We now backpropagate our loss through our model:
52
+ loss.backward()
53
+ # We then make the optimizer take a step in the right direction.
54
+ optimizer.step()
55
+ return losses
56
+
57
+
58
+ def test_model(
59
+ model: Net,
60
+ test_sampler: BatchSampler,
61
+ loss_function: Callable[..., torch.Tensor],
62
+ device: str,
63
+ fpr,
64
+ tpr,
65
+ roc
66
+ ) -> Tuple[List[torch.Tensor], List[np.ndarray]]:
67
+ # Setting the model to evaluation mode:
68
+ model.eval()
69
+ losses = []
70
+ all_y_pred_probs = []
71
+ all_y_true = []
72
+
73
+ # We need to make sure we do not update our model based on the test data:
74
+ with torch.no_grad():
75
+ for (x, y) in tqdm(test_sampler):
76
+ # Making sure our samples are stored on the same device as our model:
77
+ x = x.to(device)
78
+ y = y.to(device)
79
+ prediction = model.forward(x)
80
+ loss = loss_function(prediction, y)
81
+ losses.append(loss)
82
+ probabilities = F.softmax(prediction, dim=1)
83
+ all_y_pred_probs.append(probabilities.cpu().numpy())
84
+ all_y_true.extend(y.cpu().numpy())
85
+
86
+ y_pred_probs = np.concatenate(all_y_pred_probs, axis=0)
87
+ y_true = np.array(all_y_true)
88
+
89
+ # NOTE: Comment this for loop and uncomment the # ROC for binary class for loop in order to see the correct Binary ROC curve.
90
+ # Compute ROC curve and ROC area for each class
91
+ for i in range(6): # 6 classes
92
+ a, b, _ = roc_curve(y_true == i, y_pred_probs[:, i])
93
+ fpr[i].extend(a)
94
+ tpr[i].extend(b)
95
+ roc[i] = auc(fpr[i], tpr[i])
96
+
97
+ # # ROC for binary class
98
+ # for i in range(2): # For binary classification
99
+ # fpr[i], tpr[i], _ = roc_curve(y_true == i, y_pred_probs[:, i])
100
+ # roc[i] = auc(fpr[i], tpr[i])
101
+
102
+ return losses, y_pred_probs
dc1/visualise_performance_metrics.py ADDED
@@ -0,0 +1,24 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from pathlib import Path
2
+ import matplotlib.pyplot as plt
3
+ from sklearn.metrics import confusion_matrix, ConfusionMatrixDisplay,classification_report
4
+ from train_test import test_model
5
+ import torch
6
+ from net import Net
7
+ from batch_sampler import BatchSampler
8
+ from image_dataset import ImageDataset
9
+ from sklearn.metrics import roc_curve, auc, RocCurveDisplay
10
+ from sklearn.preprocessing import label_binarize
11
+ from itertools import cycle
12
+ # from scipy import interp
13
+ import numpy as np
14
+ from sklearn.metrics import roc_auc_score
15
+ from sklearn.preprocessing import LabelBinarizer
16
+
17
+ def create_confusion_matrix(true_labels, predicted_labels):
18
+ cm = confusion_matrix(true_labels, predicted_labels)
19
+ # Display it as a heatmap
20
+ disp = ConfusionMatrixDisplay(confusion_matrix=cm)
21
+ disp.plot(cmap=plt.cm.Blues)
22
+ plt.title('Confusion Matrix')
23
+ plt.show()
24
+
mypy.ini ADDED
@@ -0,0 +1,2 @@
 
 
 
1
+ [mypy]
2
+ exclude = venv
requirements.txt ADDED
@@ -0,0 +1,27 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ charset_normalizer==2.1.1
2
+ certifi==2023.7.22
3
+ colorama==0.4.6
4
+ fonttools==4.43.0
5
+ idna==3.4
6
+ kiwisolver==1.4.4
7
+ matplotlib==3.6.2
8
+ mypy==0.991
9
+ numpy==1.24.1
10
+ packaging==22.0
11
+ pillow==10.2.0
12
+ plotext==5.2.8
13
+ pyparsing==3.0.9
14
+ python-dateutil==2.8.2
15
+ requests==2.31.0
16
+ six==1.16.0
17
+ torch==1.13.1
18
+ torchsummary==1.5.1
19
+ tqdm==4.64.1
20
+ types-requests==2.28.11.7
21
+ types-setuptools==65.6.0.2
22
+ types-tqdm==4.64.1
23
+ urllib3==1.26.18
24
+ torchvision == 0.17.1
25
+
26
+ scikit-learn~=1.4.1.post1
27
+ setuptools~=58.1.0
setup.py ADDED
@@ -0,0 +1,17 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from setuptools import setup
2
+ from pathlib import Path
3
+
4
+ with open(Path("requirements.txt"), "r") as requirements:
5
+ dependencies = requirements.readlines()
6
+
7
+ setup(
8
+ name='Data-Challenge-1-template',
9
+ version='1.0.0',
10
+ packages=['dc1'],
11
+ url='',
12
+ license='',
13
+ author='',
14
+ author_email='',
15
+ description='',
16
+ install_requires=dependencies,
17
+ )