ummtushar commited on Sep 7, 2024

Commit

60b0ddc

verified ·

1 Parent(s): a56b30f

initial commit

Browse files

Files changed (36) hide show

README.md +98 -3
dc1/.DS_Store +0 -0
dc1/__init__.py +1 -0
dc1/__pycache__/__init__.cpython-39.pyc +0 -0
dc1/__pycache__/batch_sampler.cpython-39.pyc +0 -0
dc1/__pycache__/image_dataset.cpython-39.pyc +0 -0
dc1/__pycache__/image_dataset_BINARY.cpython-39.pyc +0 -0
dc1/__pycache__/main.cpython-39.pyc +0 -0
dc1/__pycache__/net.cpython-39.pyc +0 -0
dc1/__pycache__/net_BINARY.cpython-39.pyc +0 -0
dc1/__pycache__/train_test.cpython-39.pyc +0 -0
dc1/__pycache__/visualise_performance_metrics.cpython-39.pyc +0 -0
dc1/artifacts/artifacts/session_02_25_16_52.png +0 -0
dc1/artifacts/session_030412_45.png +0 -0
dc1/artifacts/session_030622_51.png +0 -0
dc1/artifacts/session_031915_58.png +0 -0
dc1/batch_sampler.py +81 -0
dc1/confusion_matrix.png +0 -0
dc1/data/X_train.npy +3 -0
dc1/data/Y_test.npy +3 -0
dc1/data/Y_train.npy +3 -0
dc1/eda.py +161 -0
dc1/image_dataset.py +114 -0
dc1/image_dataset_BINARY.py +125 -0
dc1/main.py +256 -0
dc1/model_weights/model_02_25_16_52.txt +0 -0
dc1/net.py +113 -0
dc1/net_BINARY.py +48 -0
dc1/perfomance_metrics.py +148 -0
dc1/processing.py +402 -0
dc1/py.typed +0 -0
dc1/train_test.py +102 -0
dc1/visualise_performance_metrics.py +24 -0
mypy.ini +2 -0
requirements.txt +27 -0
setup.py +17 -0

README.md CHANGED Viewed

@@ -1,3 +1,98 @@
----
-license: mit
----

+# How to run the code
+Here is a brief guide to run our repository:
+1. Create a virtual environment
+2. Run "pip install -r requirements.txt" in the terminal
+3. Run the `dc1\image_dataset.py` file to download the data.
+4. Run `dc1\main\py` to train and test the model
+# How to change to a Selected Model
+1. In `main.py` you can select the model by commenting the models you are not interested in and uncommenting the one you would like to use.
+Exception: For the binary model you would need to comment the following:
+train_dataset = ImageDataset(Path("data/X_train.npy"), Path("data/Y_train.npy")) & test_dataset = ImageDataset(Path("data/X_test.npy"), Path("data/Y_test.npy"))
+and uncomment the loading of the binary dataset:
+train_dataset = ImageDatasetBINARY(Path("data/X_train.npy"), Path("data/Y_train.npy")) & test_dataset = ImageDatasetBINARY(Path("data/X_test.npy"), Path("data/Y_test.npy"))
+# Data-Challenge-1-template-code
+This repository contains the template code for the TU/e course JBG040 Data Challenge 1.
+Please read this document carefully as it has been filled out with important information.
+# Data-Challenge-1-template-code
+This repository contains the template code for the TU/e course JBG040 Data Challenge 1.
+Please read this document carefully as it has been filled out with important information.
+## Code structure
+The template code is structured into multiple files, based on their functionality.
+There are five `.py` files in total, each containing a different part of the code.
+Feel free to create new files to explore the data or experiment with other ideas.
+- To download the data: run the `ImageDataset.py` file. The script will create a directory `/data/` and download the training and test data with corresponding labels to this directory.
+    - You will only have to run this script once usually, at the beginning of your project.
+- To run the whole training/evaluation pipeline: run `main.py`. This script is prepared to do the followings:
+    - Load your train and test data (Make sure its downloaded beforehand!)
+    - Initializes the neural network as defined in the `Net.py` file.
+    - Initialize loss functions and optimizers. If you want to change the loss function/optimizer, do it here.
+    - Define number of training epochs and batch size
+    - Check and enable GPU acceleration for training (if you have CUDA or Apple Silicon enabled device)
+    - Train the neural network and perform evaluation on test set at the end of each epoch.
+    - Provide plots about the training losses both during training in the command line and as a png (saved in the `/artifacts/` subdirectory)
+    - Finally, save your trained model's weights in the `/model_weights/` subdirectory so that you can reload them later.
+In your project, you are free to modify any parts of this code based on your needs.
+Note that the Neural Network structure is defined in the `Net.py` file, so if you want to modify the network itself, you can do so in that script.
+The loss functions and optimizers are all defined in `main.py`.
+## GitHub setup instructions
+1. Click the green *<> Code* button at the upper right corner of the repositiory.
+2. Make sure that the tab *Local* is selected and click *Download ZIP*.
+3. Go to the GitHub homepage and create a new repository.
+4. Make sure that the repository is set to **private** and give it the name **JBG040-GroupXX**, where XX is your group number.
+5. Press *uploading an exisiting file* and upload the extracted files from Data-Challenge-1-template-main.zip to your repository. Note that for the initial commit you should commit directly to the main branch
+6. Invite your **group members, tutor and teachers** by going to *Settings > Collaborators > Add people*.
+7. Open PyCharm and make sure that your GitHub account is linked.*
+8. In the welcome screen of PyCharm, click *Get from VCs > GitHub* and select your repository and click on clone.
+9. After the repository is cloned, you can now create a virtual environment using the requirements.txt.
+*For information on how to install PyCharm and link Github to your PyCharm, we refer to the additional resources page on Canvas.
+## Environment setup instructions
+We recommend to set up a virtual Python environment to install the package and its dependencies. To install the package, we recommend to execute `pip install -r requirements.txt.` in the command line. This will install it in editable mode, meaning there is no need to reinstall after making changes. If you are using PyCharm, it should offer you the option to create a virtual environment from the requirements file on startup. Note that also in this case, it will still be necessary to run the pip command described above.
+## Submission instructions
+After each sprint, you are expected to submit your code. This will **not** be done in Canvas, instead you will be creating a release of your current repository.
+A release is essentially a snapshot of your repository taken at a specific time.
+Your future modifications are not going to affect this release.
+**Note that you are not allowed to update your old releases after the deadline.**
+For more information on releases, see the [GitHub releases](https://docs.github.com/en/repositories/releasing-projects-on-github/about-releases) page.
+1. Make sure that your code is running without issues and that **everything is pushed to the main branch**.
+2. Head over to your repository and click on *Releases* (located at the right-hand side).
+3. Click on the green button *Create a new release*.*
+4. Click on *Choose a tag*.
+5. Fill in the textbox with **SprintX** where X is the current sprint number and press *Create new tag: SprintX*.
+6. Make sure that *Target: main* or *Target: master* (depending on your main/master branch) is selected, so that the code release will be based on your main branch.
+7. Fill in the title of the release with **Group XX Sprint X** where XX is your group number and X is the current sprint number.
+8. Click the *Publish release* button to create a release for your sprint.
+9. **Verify** that your release has been succesfully created by heading over to your repository and press the *Releases* button once again. There you should be able to see your newly created release.
+*After the first release, you should click *Draft a new release* instead of *Create a new release*
+## Mypy
+The template is created with support for full typehints. This enables the use of a powerful tool called `mypy`. Code with typehinting can be statically checked using this tool. It is recommended to use this tool as it can increase confidence in the correctness of the code before testing it. Note that usage of this tool and typehints in general is entirely up to the students and not enforced in any way. To execute the tool, simply run `mypy .`. For more information see https://mypy.readthedocs.io/en/latest/faq.html
+## Argparse
+Argparse functionality is included in the main.py file. This means the file can be run from the command line while passing arguments to the main function. Right now, there are arguments included for the number of epochs (nb_epochs), batch size (batch_size), and whether to create balanced batches (balanced_batches). You are free to add or remove arguments as you see fit.
+To make use of this functionality, first open the command prompt and change to the directory containing the main.py file.
+For example, if you're main file is in C:\Data-Challenge-1-template-main\dc1\,
+type `cd C:\Data-Challenge-1-template-main\dc1\` into the command prompt and press enter.
+Then, main.py can be run by, for example, typing `python main.py --nb_epochs 10 --batch_size 25`.
+This would run the script with 10 epochs, a batch size of 25, and balanced batches, which is also the current default.
+If you would want to run the script with 20 epochs, a batch size of 5, and batches that are not balanced,
+you would type `main.py --nb_epochs 20 --batch_size 5 --no-balanced_batches`.

dc1/.DS_Store ADDED Viewed

Binary file (8.2 kB). View file

dc1/__init__.py ADDED Viewed

	@@ -0,0 +1 @@


1	+

dc1/__pycache__/__init__.cpython-39.pyc ADDED Viewed

Binary file (159 Bytes). View file

dc1/__pycache__/batch_sampler.cpython-39.pyc ADDED Viewed

Binary file (3.05 kB). View file

dc1/__pycache__/image_dataset.cpython-39.pyc ADDED Viewed

Binary file (3.83 kB). View file

dc1/__pycache__/image_dataset_BINARY.cpython-39.pyc ADDED Viewed

Binary file (4.02 kB). View file

dc1/__pycache__/main.cpython-39.pyc ADDED Viewed

Binary file (3.99 kB). View file

dc1/__pycache__/net.cpython-39.pyc ADDED Viewed

Binary file (3.59 kB). View file

dc1/__pycache__/net_BINARY.cpython-39.pyc ADDED Viewed

Binary file (1.38 kB). View file

dc1/__pycache__/train_test.cpython-39.pyc ADDED Viewed

Binary file (2.16 kB). View file

dc1/__pycache__/visualise_performance_metrics.cpython-39.pyc ADDED Viewed

Binary file (2.02 kB). View file

dc1/artifacts/artifacts/session_02_25_16_52.png ADDED Viewed

dc1/artifacts/session_030412_45.png ADDED Viewed

dc1/artifacts/session_030622_51.png ADDED Viewed

dc1/artifacts/session_031915_58.png ADDED Viewed

dc1/batch_sampler.py ADDED Viewed

	@@ -0,0 +1,81 @@

+import numpy as np
+import random
+import torch
+from image_dataset import ImageDataset
+from typing import Generator, Tuple
+class BatchSampler:
+    """
+    Implements an iterable which given a torch dataset and a batch_size
+    will produce batches of data of that given size. The batches are
+    returned as tuples in the form (images, labels).
+    Can produce balanced batches, where each batch will have an equal
+    amount of samples from each class in the dataset. If your dataset is heavily
+    imbalanced, this might mean throwing away a lot of samples from
+    over-represented classes!
+    """
+    def __init__(self, batch_size: int, dataset: ImageDataset, balanced: bool = False) -> None:
+        self.batch_size = batch_size
+        self.dataset = dataset
+        self.balanced = balanced
+        if self.balanced:
+            # Counting the ocurrence of the class labels:
+            unique, counts = np.unique(self.dataset.targets, return_counts=True)
+            indexes = []
+            # Sampling an equal amount from each class:
+            for i in range(len(unique)):
+                print(i)
+                indexes.append(
+                    np.random.choice(
+                        np.where(self.dataset.targets == i)[0],
+                        size=counts.min(),
+                        replace=False,
+                    )
+                )
+            # Setting the indexes we will sample from later:
+            self.indexes = np.concatenate(indexes)
+        else:
+            # Setting the indexes we will sample from later (all indexes):
+            self.indexes = [i for i in range(len(dataset))]
+    def __len__(self) -> int:
+        return (len(self.indexes) // self.batch_size) + 1
+    def shuffle(self) -> None:
+        random.shuffle(self.indexes)
+    def __iter__(self) -> Generator[Tuple[torch.Tensor, torch.Tensor], None, None]:
+        remaining = False
+        self.shuffle()
+        # Go over the datset in steps of 'self.batch_size':
+        for i in range(0, len(self.indexes), self.batch_size):
+            # If our current batch is larger than the remaining data, we quit:
+            if i + self.batch_size > len(self.indexes):
+                remaining = True
+                break
+            # If not, we yield a complete batch:
+            else:
+                # Getting a list of samples from the dataset, given the indexes we defined:
+                X_batch = [
+                    self.dataset[self.indexes[k]][0]
+                    for k in range(i, i + self.batch_size)
+                ]
+                Y_batch = [
+                    self.dataset[self.indexes[k]][1]
+                    for k in range(i, i + self.batch_size)
+                ]
+                # Stacking all the samples and returning the target labels as a tensor:
+                yield torch.stack(X_batch).float(), torch.tensor(Y_batch).long()
+        # If there is still data left that was not a full batch:
+        if remaining:
+            # Return the last batch (smaller than batch_size):
+            X_batch = [
+                self.dataset[self.indexes[k]][0] for k in range(i, len(self.indexes))
+            ]
+            Y_batch = [
+                self.dataset[self.indexes[k]][1] for k in range(i, len(self.indexes))
+            ]
+            yield torch.stack(X_batch).float(), torch.tensor(Y_batch).long()

dc1/confusion_matrix.png ADDED Viewed

dc1/data/X_train.npy ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:a33632464abaa758eb0d3959466dd148972a75d1ca9afc5839335754aabc92c0
+size 275923072

dc1/data/Y_test.npy ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:163a8d71c898f1556d0dd90ffbd44017f75fb5cc5de968066671f09aa92bc997
+size 33808

dc1/data/Y_train.npy ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:3bbdfe89af848f932ea8fe330d29637a4a0e3f4d451e8ee05bd5e3107877dfb4
+size 67492

dc1/eda.py ADDED Viewed

	@@ -0,0 +1,161 @@

+# # Imports
+# import numpy as np
+# import matplotlib.pyplot as plt
+# import random
+# maincolor = '#4a8cffff'
+# secondcolor = '#e06666'
+# NOTE: File used in the very beginning of the project. Please ignore!
+# # Relative Path PUT YOUR PATHS HERE
+# path = 'dc1/data/X_train.npy'
+# data = np.load(path)
+# # Display some images to see what are we working on
+# def display_images(images, num_images=5):
+#     plt.figure(figsize=(15, 3))
+#     for i in range(num_images):
+#         plt.subplot(1, num_images, i + 1)
+#         plt.imshow(images[i].squeeze(), cmap='gray')
+#         plt.axis('off')
+#     plt.show()
+# # function call
+# display_images(data)
+# ########################################################################################################################
+# # # 1. Statistical Analysis
+# # def compute_statistics(images):
+# #     # flatten the images since x/y are irrelevant
+# #     flattened_images = images.flatten()
+# #     mean_val = np.mean(flattened_images)
+# #     median_val = np.median(flattened_images)
+# #     std_dev_val = np.std(flattened_images)
+# #     return mean_val, median_val, std_dev_val
+# # # Compute and print the statistics
+# # mean_val, median_val, std_dev_val = compute_statistics(data)
+# # print(f"Mean pixel intensity: {mean_val}")
+# # print(f"Median pixel intensity: {median_val}")
+# # print(f"Standard deviation of pixel intensities: {std_dev_val}")
+# # # Global statistics
+# # global_mean = np.mean(data)
+# # global_std = np.std(data)
+# # # Individual image statistics
+# # image_means = np.mean(data, axis=(1, 2, 3))
+# # image_stds = np.std(data, axis=(1, 2, 3))
+# # # Outlier thresholds
+# # upper_threshold = global_mean + 3 * global_std
+# # lower_threshold = global_mean - 3 * global_std
+# # outlier_indices = np.where((image_means > upper_threshold) | (image_means < lower_threshold))[0]
+# # print(f"Found {len(outlier_indices)} potential outliers based on pixel intensity means.")
+# # ########################################################################################################################
+# # # 2. Histogram Analysis
+# # def plot_histogram(images, title="Pixel Intensity Distribution"):
+# #     flattened_images = images.flatten()
+# #     # Customize plot aesthetics
+# #     plt.figure(figsize=(10, 6))
+# #     plt.hist(flattened_images, bins=256, range=(0, 255), color= maincolor, alpha=0.75)
+# #     # Adding grid, title, and labels with improved aesthetics
+# #     plt.grid(axis='y', alpha=0.75)
+# #     plt.title(title, fontsize=15, color='#333333')
+# #     plt.xlabel('Pixel Intensity', fontsize=12, color='#333333')
+# #     plt.ylabel('Frequency', fontsize=12, color='#333333')
+# #     # Customizing tick marks for better readability
+# #     plt.xticks(fontsize=10, color='#333333')
+# #     plt.yticks(fontsize=10, color='#333333')
+# #     # Adding a background color to the plot for contrast
+# #     ax = plt.gca()  # Get current axes
+# #     ax.set_facecolor('#f0f0f0')
+# #     ax.figure.set_facecolor('#f8f8f8')
+# #     # Add a border around the plot for a more polished look
+# #     for spine in ax.spines.values():
+# #         spine.set_edgecolor('#d0d0d0')
+# #     plt.show()
+# # # Plot histogram for the entire dataset
+# # plot_histogram(data, title="Pixel Intensity Distribution Across Entire Dataset")
+# # # Plot a selected image
+# # plot_histogram(data[10], title="Pixel Intensity Distribution of a Selected Image")
+# def plot_histogram_with_images(images, num_images=5):
+#     # Select a set of random images
+#     random_indices = random.sample(range(images.shape[0]), num_images)
+#     for index in random_indices:
+#         # Extract a single image
+#         single_xray_image = images[index]
+#         # Flatten the image for histogram
+#         flattened_image = single_xray_image.flatten()
+#         # Create a figure with 2 subplots
+#         fig, axs = plt.subplots(1, 2, figsize=(12, 6))
+#         # Plot histogram on the first subplot
+#         axs[0].hist(flattened_image, bins=256, range=(0, 255), color=maincolor, alpha=0.75)
+#         axs[0].set_title('Pixel Intensity Distribution')
+#         axs[0].set_xlabel('Pixel Intensity')
+#         axs[0].set_ylabel('Frequency')
+#         axs[0].set_ylim(0,600)
+#         axs[0].grid(True)
+#         # Show the image on the second subplot
+#         axs[1].imshow(single_xray_image.squeeze(), cmap='gray')
+#         axs[1].set_title('X-Ray Image')
+#         axs[1].axis('off')
+#     plt.tight_layout()
+#     plt.show()
+# plot_histogram_with_images(data)
+# # 3. Plot for Accuracy, Precision and # Recall
+# def plot_metrics_evolution(epochs, accuracy, precision):
+#     import matplotlib.pyplot as plt
+#     plt.rcParams.update({'font.size': 12})
+#     plt.figure(figsize=(12, 8))
+#     plt.plot(epochs, accuracy, label='Accuracy', marker='o', linestyle='-', color=maincolor)
+#     plt.plot(epochs, precision, label='Precision', marker='s', linestyle='--', color=secondcolor)
+# #    plt.plot(epochs, recall, label='Recall', marker='^', linestyle='-.', color='red')
+#     plt.title('Model Performance Over 10 Epochs')
+#     plt.xlabel('Epoch')
+#     plt.ylabel('Score')
+#     plt.xticks(epochs)
+#     plt.legend()
+#     plt.grid(True)
+#     plt.show()
+# # Data
+# epochs = list(range(1, 11))
+# accuracy = [0.1884, 0.1968, 0.1985, 0.2200, 0.2122, 0.2208, 0.2340, 0.2337, 0.2318, 0.2384]
+# precision = [0.1664, 0.3518, 0.2644, 0.3144, 0.3137, 0.3212, 0.2983, 0.3108, 0.2635, 0.3081]
+# # recall = [0.1884, 0.1968, 0.1985, 0.2200, 0.2122, 0.2208, 0.2340, 0.2337, 0.2318, 0.2384]
+# # Example function call
+# plot_metrics_evolution(epochs, accuracy, precision)

dc1/image_dataset.py ADDED Viewed

	@@ -0,0 +1,114 @@

+import numpy as np
+import torch
+import torchvision.transforms as T
+import torchvision.transforms.functional as TF
+import requests
+import io
+from os import path
+from typing import Tuple, List
+from pathlib import Path
+import os
+class ImageDataset:
+    """
+    Creates a DataSet from numpy arrays while keeping the data
+    in the more efficient numpy arrays for as long as possible and only
+    converting to torchtensors when needed (torch tensors are the objects used
+    to pass the data through the neural network and apply weights).
+    """
+    def __init__(self, x: Path, y: Path) -> None:
+        # Target labels
+        self.targets = ImageDataset.load_numpy_arr_from_npy(y)
+        # Images
+        self.imgs = ImageDataset.load_numpy_arr_from_npy(x)
+    def __len__(self) -> int:
+        return len(self.targets)
+    def __getitem__(self, idx: int) -> Tuple[torch.Tensor, np.ndarray]:
+        # Template code
+        image = torch.from_numpy(self.imgs[idx] / 255).float()
+        label = self.targets[idx]
+        # Preprocessing
+        # Metrics for Normalization of the images
+        mean = image.mean()
+        std = image.std()
+        # Compose: Composes several transforms together (torch documentation)
+        compose = T.Compose([
+            T.Normalize(mean, std),  # Normalization
+            T.Resize(156),  # Resizing to 156x156
+            T.CenterCrop(128),  # Cropping to focus on the center 128x128 region
+            T.Lambda(lambda x: TF.rotate(x, angle=90)),  # Rotating by 90 degrees
+            T.RandomHorizontalFlip(p=0.5),  # Random horizontal flip with a 50% probability
+            T.RandomVerticalFlip(p=0.5),  # Random vertical flip with a 50% probability
+            T.Lambda(lambda x: x + torch.randn_like(x) * 0.1)  # Adding random noise
+        ])
+        # Apply the transformation done by composee
+        image = compose(image)
+        return image, label
+    def get_labels(self) -> List[np.ndarray]:
+        return self.targets.tolist()
+    @staticmethod
+    def load_numpy_arr_from_npy(path: Path) -> np.ndarray:
+        """
+        Loads a numpy array from local storage.
+        Input:
+        path: local path of file
+        Outputs:
+        dataset: numpy array with input features or labels
+        """
+        return np.load(path)
+def load_numpy_arr_from_url(url: str) -> np.ndarray:
+    """
+    Loads a numpy array from surfdrive.
+    Input:
+    url: Download link of dataset
+    Outputs:
+    dataset: numpy array with input features or labels
+    """
+    response = requests.get(url)
+    response.raise_for_status()
+    return np.load(io.BytesIO(response.content))
+if __name__ == "__main__":
+    cwd = os.getcwd()
+    if path.exists(path.join(cwd + "data/")):
+        print("Data directory exists, files may be overwritten!")
+    else:
+        os.mkdir(path.join(cwd, "data/"))
+    ### Load labels
+    train_y = load_numpy_arr_from_url(
+        url="https://surfdrive.surf.nl/files/index.php/s/i6MvQ8nqoiQ9Tci/download"
+    )
+    np.save("data/Y_train.npy", train_y)
+    test_y = load_numpy_arr_from_url(
+        url="https://surfdrive.surf.nl/files/index.php/s/wLXiOjVAW4AWlXY/download"
+    )
+    np.save("data/Y_test.npy", test_y)
+    ### Load data
+    train_x = load_numpy_arr_from_url(
+        url="https://surfdrive.surf.nl/files/index.php/s/4rwSf9SYO1ydGtK/download"
+    )
+    np.save("data/X_train.npy", train_x)
+    test_x = load_numpy_arr_from_url(
+        url="https://surfdrive.surf.nl/files/index.php/s/dvY2LpvFo6dHef0/download"
+    )
+    np.save("data/X_test.npy", test_x)

dc1/image_dataset_BINARY.py ADDED Viewed

	@@ -0,0 +1,125 @@

+import numpy as np
+import torch
+import torchvision.transforms as T
+import torchvision.transforms.functional as TF
+import requests
+import io
+from os import path
+from typing import Tuple, List
+from pathlib import Path
+import os
+class ImageDatasetBINARY:
+    """
+    Creates a DataSet from numpy arrays while keeping the data
+    in the more efficient numpy arrays for as long as possible and only
+    converting to torchtensors when needed (torch tensors are the objects used
+    to pass the data through the neural network and apply weights).
+    """
+    def __init__(self, x: Path, y: Path) -> None:
+        # Target labels
+        self.targets = ImageDatasetBINARY.load_numpy_arr_from_npy(y)
+        # Images
+        self.imgs = ImageDatasetBINARY.load_numpy_arr_from_npy(x)
+        # Division in:
+        # SICK = 1
+        self.targets[self.targets == 0] = 1 # Atelactasis to SICK
+        self.targets[self.targets == 1] = 1 # Effusion to SICK
+        self.targets[self.targets == 2] = 1 # Infiltration to SICK
+        self.targets[self.targets == 4] = 1 # Nodule to SICK
+        self.targets[self.targets == 5] = 1 # Pneumonia to SICK
+        # NON SICK = 0
+        self.targets[self.targets == 3] = 0 # No Finding to NON SICK
+    def __len__(self) -> int:
+        return len(self.targets)
+    def __getitem__(self, idx: int) -> Tuple[torch.Tensor, np.ndarray]:
+        # Template code
+        image = torch.from_numpy(self.imgs[idx] / 255).float()
+        label = self.targets[idx]
+        # Metrics for Normalization of the images
+        mean = image.mean()
+        std = image.std()
+        # Compose: Composes several transforms together (torch documentation)
+        compose = T.Compose([
+            T.Normalize(mean, std),  # Normalization
+            T.Resize(156),  # Resizing to 156x156
+            T.CenterCrop(128),  # Cropping to focus on the center 128x128 region
+            T.Lambda(lambda x: TF.rotate(x, angle=90)),  # Rotating by 90 degrees
+            T.RandomHorizontalFlip(p=0.5),  # Random horizontal flip with a 50% probability
+            T.RandomVerticalFlip(p=0.5),  # Random vertical flip with a 50% probability
+            T.Lambda(lambda x: x + torch.randn_like(x) * 0.1)  # Adding random noise
+        ])
+        # Apply the transformation done by composee
+        image = compose(image)
+        return image, label
+    def get_labels(self) -> List[np.ndarray]:
+        return self.targets.tolist()
+    @staticmethod
+    def load_numpy_arr_from_npy(path: Path) -> np.ndarray:
+        """
+        Loads a numpy array from local storage.
+        Input:
+        path: local path of file
+        Outputs:
+        dataset: numpy array with input features or labels
+        """
+        return np.load(path)
+def load_numpy_arr_from_url(url: str) -> np.ndarray:
+    """
+    Loads a numpy array from surfdrive.
+    Input:
+    url: Download link of dataset
+    Outputs:
+    dataset: numpy array with input features or labels
+    """
+    response = requests.get(url)
+    response.raise_for_status()
+    return np.load(io.BytesIO(response.content))
+if __name__ == "__main__":
+    cwd = os.getcwd()
+    if path.exists(path.join(cwd + "data/")):
+        print("Data directory exists, files may be overwritten!")
+    else:
+        os.mkdir(path.join(cwd, "data/"))
+    ### Load labels
+    train_y = load_numpy_arr_from_url(
+        url="https://surfdrive.surf.nl/files/index.php/s/i6MvQ8nqoiQ9Tci/download"
+    )
+    np.save("data/Y_train.npy", train_y)
+    test_y = load_numpy_arr_from_url(
+        url="https://surfdrive.surf.nl/files/index.php/s/wLXiOjVAW4AWlXY/download"
+    )
+    np.save("data/Y_test.npy", test_y)
+    ### Load data
+    train_x = load_numpy_arr_from_url(
+        url="https://surfdrive.surf.nl/files/index.php/s/4rwSf9SYO1ydGtK/download"
+    )
+    np.save("data/X_train.npy", train_x)
+    test_x = load_numpy_arr_from_url(
+        url="https://surfdrive.surf.nl/files/index.php/s/dvY2LpvFo6dHef0/download"
+    )
+    np.save("data/X_test.npy", test_x)

dc1/main.py ADDED Viewed

	@@ -0,0 +1,256 @@

+# Custom imports
+from batch_sampler import BatchSampler
+from image_dataset import ImageDataset
+from net import Net, ResNetModel, EfficientNetModel, EfficientNetModel_b7
+from train_test import train_model, test_model
+from sklearn.metrics import confusion_matrix, ConfusionMatrixDisplay,classification_report
+from visualise_performance_metrics import create_confusion_matrix, ROC_multiclass
+from image_dataset_BINARY import ImageDatasetBINARY
+from net_BINARY import Net_BINARY
+# Torch imports
+import torch
+import torch.nn as nn
+import torch.optim as optim
+from torchsummary import summary
+# Other imports
+import matplotlib.pyplot as plt
+from matplotlib.pyplot import figure
+import os
+import argparse
+import plotext
+from datetime import datetime
+from pathlib import Path
+from typing import List
+from sklearn.metrics import accuracy_score, precision_score, recall_score, roc_curve
+import numpy as np
+def main(args: argparse.Namespace, activeloop: bool = True) -> None:
+    # NOTE: Uncomment the dataset you would like to use. For instance, if you would like to run the Binary Model,
+    # you would need to comment the ImageDataset Class lines, or viceversa if you are running the other models.
+    # Load the train and test data set
+    train_dataset = ImageDataset(Path('dc1/data/X_train.npy'), Path('dc1/data/Y_train.npy'))
+    test_dataset = ImageDataset(Path('dc1/data/X_test.npy'), Path('dc1/data/Y_test.npy'))
+    # Load the BINARY train and BINARY test data set
+    # train_dataset = ImageDatasetBINARY(Path('dc1/data/X_train.npy'), Path('dc1/data/Y_train.npy'))
+    # test_dataset = ImageDatasetBINARY(Path('dc1/data/X_test.npy'), Path('dc1/data/Y_test.npy'))
+    # Load the Neural Net.
+    # NOTE: set number of distinct labels here
+    # NOTE: uncomment when you need to use one of the models
+    # Improved Net
+    # model = Net(n_classes=6)
+    # ResNet Pre-Trained Model
+    # model = ResNetModel(n_classes=6)
+    # EfficientNet Model Pre-Trained == OUR SELECTED MODEL
+    model = EfficientNetModel(n_classes=6)
+    # Binary Model
+    # model = Net_BINARY(n_classes=2)
+    # Initialize optimizer(s) and loss function(s)
+    optimizer = optim.SGD(model.parameters(), lr=0.01, momentum=0.1)
+    optimizer = optim.SGD(model.parameters(), lr=0.01, momentum=0.1)
+    loss_function = nn.CrossEntropyLoss()
+    # fetch epoch and batch count from arguments
+    n_epochs = args.nb_epochs
+    batch_size = args.batch_size
+    # IMPORTANT! Set this to True to see actual errors regarding
+    # the structure of your model (GPU acceleration hides them)!
+    # Also make sure you set this to False again for actual model training
+    # as training your model with GPU-acceleration (CUDA/MPS) is much faster.
+    DEBUG = False
+    # fpr = {x:[] for x in range(6)}
+    # tpr = {x:[] for x in range(6)}
+    # auc = {}
+    # Moving our model to the right device (CUDA will speed training up significantly!)
+    if torch.cuda.is_available() and not DEBUG:
+        print("@@@ CUDA device found, enabling CUDA training...")
+        device = "cuda"
+        model.to(device)
+        # Creating a summary of our model and its layers:
+        summary(model, (1, 128, 128), device=device)
+    elif (
+            torch.backends.mps.is_available() and not DEBUG
+    ):  # PyTorch supports Apple Silicon GPU's from version 1.12
+        print("@@@ Apple silicon device enabled, training with Metal backend...")
+        device = "mps"
+        model.to(device)
+    else:
+        print("@@@ No GPU boosting device found, training on CPU...")
+        device = "cpu"
+        # Creating a summary of our model and its layers:
+        summary(model, (1, 128, 128), device=device)
+    # Lets now train and test our model for multiple epochs:
+    train_sampler = BatchSampler(
+        batch_size=batch_size, dataset=train_dataset, balanced=args.balanced_batches
+    )
+    test_sampler = BatchSampler(
+        batch_size=100, dataset=test_dataset, balanced=args.balanced_batches
+    )
+    mean_losses_train: List[torch.Tensor] = []
+    mean_losses_test: List[torch.Tensor] = []
+    for e in range(n_epochs):
+        if activeloop:
+            # Training:
+            losses = train_model(model, train_sampler, optimizer, loss_function, device)
+            # Calculating and printing statistics:
+            mean_loss = sum(losses) / len(losses)
+            mean_losses_train.append(mean_loss)
+            print(f"\nEpoch {e + 1} training done, loss on train set: {mean_loss}\n")
+            # Testing:
+            # losses, y_pred_probs = test_model(model, test_sampler, loss_function, device)
+            fpr = {x:[] for x in range(6)}
+            tpr = {x:[] for x in range(6)}
+            auc = {}
+            # # Calculating and printing statistics:
+            losses, y_pred_probs = test_model(model, test_sampler, loss_function, device, fpr, tpr, auc)
+            # # Calculating and printing statistics:
+            mean_loss = sum(losses) / len(losses)
+            mean_losses_test.append(mean_loss)
+            print(f"\nEpoch {e + 1} testing done, loss on test set: {mean_loss}\n")
+            print(auc)
+            ### Plotting during training
+            plotext.clf()
+            plotext.scatter(mean_losses_train, label="train")
+            plotext.scatter(mean_losses_test, label="test")
+            plotext.title("Train and test loss")
+            plotext.xticks([i for i in range(len(mean_losses_train) + 1)])
+            plotext.show()
+    ##################################################################################################################
+    #                    R O C      C U R V E S
+    # ##################################################################################################################
+    # NOTE: If you would like to run the ROC function for the Binary dataset, you would need to comment the following code
+    # and uncomment the # ROC CURVE FOR BINARY. Additionally, in train_test.py file, in order to make the Binary ROC curve
+    # you would need to comment some lines specified in the file. Please check the file.
+    # ROC CURVE FOR MULTICLASS
+    plt.figure(figsize=(8, 6))
+    colors = plt.cm.get_cmap('viridis', 6).colors
+    class_names = ['Class 0 (Atelactasis)','Class 1 (Effusion)', 'Class 2 (Infiltration)', 'Class 3 (No Finding)', 'Class 4 (Nodule)', 'Class 5 (Pneumonia)']
+    for i, color in zip(range(6), colors):
+        plt.plot(fpr[i], tpr[i], color=color, lw=2, label='{} (AUC = {:.2f})'.format(class_names[i], auc[i]))
+    plt.plot([0, 1], [0, 1], color='gray', lw=1, linestyle='--')
+    plt.xlim([0.0, 1.0])
+    plt.ylim([0.0, 1.05])
+    plt.xlabel('False Positive Rate')
+    plt.ylabel('True Positive Rate')
+    plt.title('ROC Curves for 6 Classes')
+    plt.legend(loc="lower right")
+    plt.show()
+    # ROC CURVE FOR BINARY
+    # for i in range(2):
+    #     plt.plot(fpr[i], tpr[i], label=f'Class {i} (AUC = {auc[i]:.2f})')
+    # plt.plot([0, 1], [0, 1], 'k--', label='Random chance')
+    # plt.xlabel('False Positive Rate')
+    # plt.ylabel('True Positive Rate')
+    # plt.title('ROC Curves for 2 Classes (1=Sick, 0=Non-sick)')
+    # plt.legend(loc='lower right')
+    # plt.show()
+    # retrieve current time to label artifacts
+    now = datetime.now()
+    # check if model_weights/ subdir exists
+    if not Path("model_weights/").exists():
+        os.mkdir(Path("model_weights/"))
+    if not Path("model_weights/").exists():
+        os.mkdir(Path("model_weights/"))
+    # Saving the model
+    torch.save(model.state_dict(), f"model_weights/model_{now.month:02}{now.day:02}{now.hour}_{now.minute:02}.txt")
+    torch.save(model.state_dict(), f"model_weights/model_{now.month:02}{now.day:02}{now.hour}_{now.minute:02}.txt")
+    # Create plot of losses
+    figure(figsize=(9, 10), dpi=80)
+    fig, (ax1, ax2) = plt.subplots(2, sharex=True)
+    ax1.plot(range(1, 1 + n_epochs), [x.detach().cpu() for x in mean_losses_train], label="Train", color="blue")
+    ax2.plot(range(1, 1 + n_epochs), [x.detach().cpu() for x in mean_losses_test], label="Test", color="red")
+    fig.legend()
+    # Check if /artifacts/ subdir exists
+    if not Path("artifacts/").exists():
+        os.mkdir(Path("artifacts/"))
+    if not Path("artifacts/").exists():
+        os.mkdir(Path("artifacts/"))
+    # save plot of losses
+    fig.savefig(Path("artifacts") / f"session_{now.month:02}{now.day:02}{now.hour}_{now.minute:02}.png")
+    ##################################################################################################################
+    #      C O N F U S I O N      M A T R I X  &    C L A S S I F I C A T I O N      R E P O R T
+    # ##################################################################################################################
+    true_labels = test_dataset.get_labels()
+    # Set the model to evaluation mode
+    model.eval()
+    predicted_labels = []
+    with torch.no_grad():
+        for inputs, _ in test_dataset:
+            inputs = inputs.unsqueeze(0).to(device)
+            outputs = model(inputs)
+            # Get predicted labels by getting max value (aka most likely)
+            _, predicted = torch.max(outputs, 1)
+            predicted_labels.extend(predicted.cpu().numpy())
+    # Calculate Confusion Matrix
+    conf_matrix = confusion_matrix(true_labels, predicted_labels)
+    print("Confusion Matrix:")
+    print(conf_matrix)
+    # plot the confusion matrix
+    # fig, ax = plt.subplots()
+    # ConfusionMatrixDisplay(confusion_matrix=conf_matrix).plot(ax=ax, cmap="Blues")
+    # plt.show()
+    # plt.savefig('confusion_matrix.png')
+    create_confusion_matrix(true_labels, predicted_labels)
+    # Classification report (accuracy, precision, f1 etc)
+    class_report = classification_report(true_labels, predicted_labels)
+    print("\nClassification Report:")
+    print(class_report)
+    fig.savefig(Path("artifacts") / f"session_{now.month:02}{now.day:02}{now.hour}_{now.minute:02}.png")
+if __name__ == "__main__":
+    parser = argparse.ArgumentParser()
+    parser.add_argument(
+        "--nb_epochs", help="number of training iterations", default=1, type=int)
+    parser.add_argument("--batch_size", help="batch_size", default=25, type=int)
+    parser.add_argument(
+        "--balanced_batches",
+        help="whether to balance batches for class labels",
+        default=True,
+        type=bool,
+    )
+    args = parser.parse_args()
+    main(args)

dc1/model_weights/model_02_25_16_52.txt ADDED Viewed

Binary file (333 kB). View file

dc1/net.py ADDED Viewed

	@@ -0,0 +1,113 @@

+import torch
+import torch.nn as nn
+import torchvision.models as models
+# Improved Net
+class Net(nn.Module):
+    def __init__(self, n_classes: int) -> None:
+        super(Net, self).__init__()
+        self.cnn_layers = nn.Sequential(
+            nn.Conv2d(1, 64, kernel_size=3, stride=1, padding=1),
+            nn.BatchNorm2d(64),
+            nn.ReLU(inplace=True),
+            nn.AvgPool2d(kernel_size=2),
+            nn.Dropout(p=0.5),
+            nn.Conv2d(64, 32, kernel_size=3, stride=1, padding=1),
+            nn.BatchNorm2d(32),
+            nn.ReLU(inplace=True),
+            nn.AvgPool2d(kernel_size=2),
+            nn.Dropout(p=0.25),
+            nn.Conv2d(32, 16, kernel_size=3, stride=1, padding=1),
+            nn.BatchNorm2d(16),
+            nn.ReLU(inplace=True),
+            nn.AvgPool2d(kernel_size=2),
+            nn.Dropout(p=0.125),
+            nn.Conv2d(16, 8, kernel_size=3, stride=1, padding=1),
+            nn.BatchNorm2d(8),
+            nn.ReLU(inplace=True),
+            nn.AvgPool2d(kernel_size=2),
+            nn.Dropout(p=0.1),
+            nn.Conv2d(8, 4, kernel_size=3, stride=1, padding=1),
+            nn.BatchNorm2d(4),
+            nn.ReLU(inplace=True),
+            nn.MaxPool2d(kernel_size=2),
+            nn.Dropout(p=0.05),
+            # New layer
+            nn.Conv2d(4, 4, kernel_size=3, stride=1, padding=1),
+            nn.BatchNorm2d(4),
+            nn.ReLU(inplace=True),
+            nn.MaxPool2d(kernel_size=2),
+            nn.Dropout(p=0.05),
+        )
+        self.linear_layers = nn.Sequential(
+            nn.Linear(16, 256),
+            nn.Linear(256, n_classes)
+        )
+    # Defining the forward pass
+    def forward(self, x: torch.Tensor) -> torch.Tensor:
+        x = self.cnn_layers(x)
+        # After our convolutional layers which are 2D, we need to flatten our
+        # input to be 1 dimensional, as the linear layers require this.
+        x = x.view(x.size(0), -1)
+        x = self.linear_layers(x)
+        return x
+# Implementing Pre-Trained ResNet as a class
+class ResNetModel(nn.Module):
+    def __init__(self, n_classes: int, pretrained: bool = True):
+        # Loading a pre-trained ResNet model
+        super(ResNetModel, self).__init__()
+        self.resnet = models.resnet34(pretrained=pretrained)
+        self.resnet.conv1 = nn.Conv2d(1, 64, kernel_size=4, stride=(2, 2), padding=(3, 3), bias=False)
+        num_ftrs = self.resnet.fc.in_features
+        self.resnet.fc = nn.Linear(num_ftrs, n_classes)
+    def forward(self, x):
+        # Forward pass through the ResNet model
+        return self.resnet(x)
+# Implementing the Efficient Net Model with efficientnet_b0
+class EfficientNetModel(nn.Module):
+    def __init__(self, n_classes: int, version: str = 'b0', pretrained: bool = True):
+        super(EfficientNetModel, self).__init__()
+        # Loading a pretrained EfficientNet model
+        self.efficientnet = models.efficientnet_b0(pretrained=pretrained) if version == 'b0' else models.__dict__[f'efficientnet_{version}'](pretrained=pretrained)
+        # Adjusting the classifier to match the number of classes
+        num_ftrs = self.efficientnet.classifier[1].in_features
+        self.efficientnet.classifier[1] = nn.Linear(num_ftrs, n_classes)
+    def forward(self, x):
+        # Forward pass through the EfficientNet model
+        # Replicating the grayscale channel to have 3 channels
+        x = x.repeat(1, 3, 1, 1)
+        return self.efficientnet(x)
+# Implementing the Efficient Net Model with efficientnet_b7
+# NOTE: This model takes a lot of time to run
+class EfficientNetModel_b7(nn.Module):
+    def __init__(self, n_classes: int, version: str = 'b0', pretrained: bool = True):
+        super(EfficientNetModel_b7, self).__init__()
+        # Loading a pretrained EfficientNet model
+        self.efficientnet = models.efficientnet_b7(pretrained=pretrained) if version == 'b7' else models.__dict__[f'efficientnet_{version}'](pretrained=pretrained)
+        # Adjusting the classifier to match the number of classes
+        num_ftrs = self.efficientnet.classifier[1].in_features
+        self.efficientnet.classifier[1] = nn.Linear(num_ftrs, n_classes)
+    def forward(self, x):
+        # Forward pass through the EfficientNet model
+        # Replicating the grayscale channel to have 3 channels
+        x = x.repeat(1, 3, 1, 1)
+        return self.efficientnet(x)

dc1/net_BINARY.py ADDED Viewed

	@@ -0,0 +1,48 @@

+import torch
+import torch.nn as nn
+import torchvision.models as models
+# ORIGINAL ORGINAL NET (from template)
+class Net_BINARY(nn.Module):
+    def __init__(self, n_classes: int) -> None:
+        super(Net_BINARY, self).__init__()
+        self.cnn_layers = nn.Sequential(
+            # Defining a 2D convolution layer
+            nn.Conv2d(1, 32, kernel_size=4, stride=1),
+            nn.PReLU(),
+            nn.BatchNorm2d(32),
+            nn.ReLU6(inplace=True),
+            nn.AvgPool2d(kernel_size=3),
+            torch.nn.Dropout(p=0.5, inplace=True),
+            # Defining another 2D convolution layer
+            nn.Conv2d(32, 64, kernel_size=4, stride=1),
+            nn.PReLU(),
+            nn.BatchNorm2d(64),
+            nn.ReLU6(inplace=True),
+            nn.AvgPool2d(kernel_size=3),
+            torch.nn.Dropout(p=0.25, inplace=True),
+            # Defining another 2D convolution layer
+            nn.Conv2d(64, 128, kernel_size=3, stride=1),
+            nn.PReLU(),
+            nn.BatchNorm2d(128),
+            nn.Sigmoid(),
+            nn.AvgPool2d(kernel_size=3),
+            torch.nn.Dropout(p=0.125, inplace=True),
+        )
+        self.linear_layers = nn.Sequential(
+            nn.Linear(1152, 312),
+            nn.Linear(312, n_classes)
+        )
+    # Defining the forward pass
+    def forward(self, x: torch.Tensor) -> torch.Tensor:
+        x = self.cnn_layers(x)
+        # After our convolutional layers which are 2D, we need to flatten our
+        # input to be 1 dimensional, as the linear layers require this.
+        x = x.view(x.size(0), -1)
+        x = self.linear_layers(x)
+        return x

dc1/perfomance_metrics.py ADDED Viewed

	@@ -0,0 +1,148 @@

+from pathlib import Path
+import matplotlib.pyplot as plt
+from sklearn import metrics
+from sklearn.metrics import confusion_matrix, ConfusionMatrixDisplay, roc_curve, auc, classification_report, RocCurveDisplay
+from sklearn.metrics import confusion_matrix, ConfusionMatrixDisplay
+from train_test import test_model
+import torch
+from net import Net
+from batch_sampler import BatchSampler
+from image_dataset import ImageDataset
+from sklearn.metrics import roc_curve, auc, RocCurveDisplay
+from sklearn.preprocessing import label_binarize
+from itertools import cycle
+from scipy import interp
+import numpy as np
+from sklearn.metrics import roc_auc_score
+from sklearn.preprocessing import LabelBinarizer
+import seaborn as sns
+# NOTE: File used in the beginning of the project. Please ignore!
+def ConfusionMatrix(y_pred, y):
+    # Obtaining the predicted data
+    y_pred = y_pred.cpu()
+    y = y.cpu()
+    reshaped = y.reshape(-1)
+    # Plot Confusion Matrix
+    report = classification_report(y, y_pred, zero_division=1)
+    print(report)
+    conf = confusion_matrix(reshaped, y_pred)
+    disp = ConfusionMatrixDisplay(confusion_matrix=conf)
+    FP = conf.sum(axis=0) - np.diag(conf)
+    FN = conf.sum(axis=1) - np.diag(conf)
+    TP = np.diag(conf)
+    TN = conf.sum() - (FP + FN + TP)
+    return disp, FP, FN, TP, TN
+def ROC(y_pred_prob, y_pred, y):
+    prob_reshape = y_pred_prob.cpu().reshape(-1)
+    y_pred = y_pred.cpu()
+    reshaped = y.cpu().reshape(-1)
+    y_pred_prob = y_pred_prob.cpu().numpy()  # Convert to NumPy array
+    y_pred = y_pred.cpu().numpy()            # Convert to NumPy array
+    y = y.cpu().numpy()
+    binary = []
+    for i in range(len(y_pred)):
+        if (y_pred[i] == reshaped[i]):
+            binary.append(1)
+        else:
+            binary.append(0)
+    fpr, tpr, threshold = metrics.roc_curve(binary, prob_reshape[:86])
+    roc_auc = metrics.auc(fpr, tpr)
+    disp_roc = metrics.RocCurveDisplay(fpr=fpr, tpr=tpr, roc_auc=roc_auc)
+    return disp_roc
+# def ROC2(y_train, y_test, y_score):
+#     unique, counts = np.unique(np.concatenate((y_train, y_test)), return_counts=True)
+#     print(dict(zip(unique, counts)))
+#     label_binarizer = LabelBinarizer().fit(y_train)
+#     y_onehot_test = label_binarizer.transform(y_test)
+#     n_classes = len(label_binarizer.classes_)
+#     class_off_interest = 1
+#     class_id = np.flatnonzero(label_binarizer.classes_ == class_off_interest)[0]
+#     fig, ax = plt.subplots(figsize=(6, 6))
+#     target_names = ["Atelectasis", "Effusion", "Infiltration", "No Finding", "Nodule", "Pneumothorax"]
+#     colors = cycle(["purple", "darkorange", "cornflowerblue", "red", "green", "darkblue"])
+#     for class_id, color in zip(range(n_classes), colors):
+#         RocCurveDisplay.from_predictions(
+#             y_onehot_test[:, class_id],
+#             y_score[:, class_id],
+#             name=f"ROC curve for {target_names[class_id]}",
+#             color=color,
+#             ax=ax
+#         )
+#     plt.plot([0, 1], [0, 1], "k--", label="ROC curve for chance level (AUC = 0.5)")
+#     return fig
+# def ROC2(y_true, y_pred_prob, n_classes):
+#     lb = LabelBinarizer()
+#     y_true_binarized = lb.fit_transform(y_true)  # Binarize y_true
+#     print("Shape of y_pred_prob:", y_pred_prob.shape)
+#     fig, ax = plt.subplots(figsize=(8, 6))  # Prepare a figure for plotting
+#     # Iterate over each class to calculate ROC
+#     for i in range(n_classes):
+#         y_true_class = y_true_binarized[:, i]  # True labels for class i
+#         y_pred_class = y_pred_prob[:, i]       # Predicted probabilities for class i
+#         # Calculate ROC curve
+#         fpr, tpr, thresholds = roc_curve(y_true_class, y_pred_class)
+#         roc_auc = auc(fpr, tpr)
+#         # Plot ROC curve
+#         RocCurveDisplay(fpr=fpr, tpr=tpr, roc_auc=roc_auc).plot(ax=ax)
+#     plt.title("Multiclass ROC Curve")
+#     plt.show()
+#     return fig
+def ROC_multiclass(y_true, y_pred_prob, n_classes):
+    # Binarize the output
+    y_true = label_binarize(y_true, classes=[*range(n_classes)])
+    fpr = dict()
+    tpr = dict()
+    roc_auc = dict()
+    for i in range(n_classes):
+        fpr[i], tpr[i], _ = roc_curve(y_true[:, i], y_pred_prob[:, i])
+        roc_auc[i] = auc(fpr[i], tpr[i])
+    # Plot ROC curves
+    plt.figure()
+    colors = cycle(['blue', 'red', 'green', 'yellow', 'orange', 'purple'])
+    for i, color in zip(range(n_classes), colors):
+        plt.plot(fpr[i], tpr[i], color=color, lw=2,
+                 label='ROC curve of class {0} (area = {1:0.2f})'
+                 ''.format(i, roc_auc[i]))
+    plt.plot([0, 1], [0, 1], 'k--', lw=2)
+    plt.xlim([0.0, 1.0])
+    plt.ylim([0.0, 1.05])
+    plt.xlabel('False Positive Rate')
+    plt.ylabel('True Positive Rate')
+    plt.title('Multiclass ROC')
+    plt.legend(loc="lower right")
+    plt.show()
+    return plt

dc1/processing.py ADDED Viewed

	@@ -0,0 +1,402 @@

+# # # # Imports
+# # # import torch
+# # # import numpy as np
+# # # import pandas as pd
+# # # import matplotlib.pyplot as plt
+# # # import seaborn as sns
+# # # # Imports
+# # # import torch
+# # # import numpy as np
+# # # import pandas as pd
+# # # import matplotlib.pyplot as plt
+# # # import seaborn as sns
+# # # from sklearn.metrics import confusion_matrix, roc_curve, auc
+# # # from typing import Callable, List, Tuple
+# # # import torch.nn as nn
+# # # from pathlib import Path
+# # # import torch.nn.functional as F
+# # # from yaml import FlowSequenceStartToken
+# # # from sklearn.metrics import confusion_matrix, roc_curve, auc
+# # # from typing import Callable, List, Tuple
+# # # import torch.nn as nn
+# # # from pathlib import Path
+# # # import torch.nn.functional as F
+# # # from yaml import FlowSequenceStartToken
+# # Import files
+# from image_dataset import ImageDataset
+# from net import Net, ResNetModel, EfficientNetModel
+# from train_test import train_model, test_model
+# from batch_sampler import BatchSampler
+# NOTE: File used in the very beginning of the project. Please ignore!
+# maincolor = '#4a8cffff'
+# secondcolor = '#e06666'
+# # Train data
+# labels_train_path = 'dc1/data/Y_train.npy'
+# data_train_path = 'dc1/data/X_train.npy'
+# # Test data
+# labels_test_path = 'dc1/data/Y_test.npy'
+# data_test_path = 'dc1/data/X_test.npy'
+# y_train = np.load(labels_train_path)
+# unique_labels = np.unique(y_train)
+# data_train = np.load(data_train_path)
+# # Data Verification to check if we all have everything good
+# data_shape = data_train.shape
+# data_type = data_train.dtype
+# labels_shape = y_train.shape
+# labels_type = y_train.dtype
+# print(f"Data Shape: {data_shape}, Data Type: {data_type}")
+# print(f"Labels Shape: {labels_shape}, Labels Type: {labels_type}")
+# # Check the range and distribution of features
+# data_range = (np.min(data_train), np.max(data_train))
+# # Label Encoding in accordance to the diseases
+# class_names_mapping = {
+#     0: 'Atelectasis',
+#     1: 'Effusion',
+#     2: 'Infiltration',
+#     3: 'No Finding',
+#     4: 'Nodule',
+#     5: 'Pneumonia'
+# }
+# print("Unique classes in the training set:")
+# for class_id in unique_labels:
+#     print(f"Class ID {class_id}: {class_names_mapping[class_id]}")
+# # df for distribution analysis
+# df_data_range = pd.DataFrame(data_train.reshape(data_train.shape[0], -1))
+# ###################################################################
+# ###########   A D V A N C E D         A N L Y S I S     ###########
+# ##################################################################
+# # Y test data (labels)
+# y_test = np.load(labels_test_path)
+# # Initialize model (NET)
+# n_classes = 6
+# # NOTE : change the nn here!
+# model = Net(n_classes=n_classes)
+# # model = ResNetModel(n_classes=n_classes)
+# # model = EfficientNetModel(n_classes=n_classes)
+# # Device for test_model function call
+# device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
+# model.to(device)
+# # Initialize the loss function
+# loss_function = nn.CrossEntropyLoss()  # we can use another, this one i found in internet but I was getting errors...
+# # # Data Verification to check if we all have everything good
+# # data_shape = data_train.shape
+# # data_type = data_train.dtype
+# # labels_shape = y_train.shape
+# # labels_type = y_train.dtype
+# # print(f"Data Shape: {data_shape}, Data Type: {data_type}")
+# # print(f"Labels Shape: {labels_shape}, Labels Type: {labels_type}")
+# # # Check the range and distribution of features
+# # data_range = (np.min(data_train), np.max(data_train))
+# # # Label Encoding in accordance to the diseases
+# # class_names_mapping = {
+# #     0: 'Atelectasis',
+# #     1: 'Effusion',
+# #     2: 'Infiltration',
+# #     3: 'No Finding',
+# #     4: 'Nodule',
+# #     5: 'Pneumonia'
+# # }
+# # print("Unique classes in the training set:")
+# # for class_id in unique_labels:
+# #     print(f"Class ID {class_id}: {class_names_mapping[class_id]}")
+# # # df for distribution analysis
+# # df_data_range = pd.DataFrame(data_train.reshape(data_train.shape[0], -1))
+# # ###################################################################
+# # ###########   A D V A N C E D         A N L Y S I S     ###########
+# # ##################################################################
+# # # Y test data (labels)
+# # y_test = np.load(labels_test_path)
+# # # Initialize model (NET)
+# # n_classes = 6
+# # # NOTE : change the nn here!
+# # model = Net(n_classes=n_classes)
+# # # model = ResNetModel(n_classes=n_classes)
+# # # model = EfficientNetModel(n_classes=n_classes)
+# # # Device for test_model function call
+# # device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
+# # model.to(device)
+# # # Initialize the loss function
+# # loss_function = nn.CrossEntropyLoss()  # we can use another, this one i found in internet but I was getting errors...
+# # # Load test dataset w function
+# # test_dataset = ImageDataset(Path("dc1/data/X_test.npy"), Path("dc1/data/Y_test.npy"))
+# # # Initialize the BatchSampler
+# # batch_size = 32
+# # test_loader = BatchSampler(batch_size=batch_size, dataset=test_dataset, balanced=False)  #  'balanced' or not we can choose depending on what we want
+# # # Function call
+# # losses, predicted_labels, true_labels, probabilities = test_model(model, test_loader, loss_function, device)
+# #####################  R O C     C U R V E   #####################
+# def plot_multiclass_roc_curve(y_true, y_scores, num_classes):
+#     # Compute ROC curve and ROC area for each class
+#     fpr = dict()
+#     tpr = dict()
+#     roc_auc = dict()
+#     for i in range(num_classes):
+#         fpr[i], tpr[i], _ = roc_curve(y_true[:, i], y_scores[:, i])
+#         roc_auc[i] = auc(fpr[i], tpr[i])
+#     # Plot all ROC curves
+#     plt.figure()
+#     for i in range(num_classes):
+#         plt.plot(fpr[i], tpr[i], label=f'ROC curve of class {i} (area = {roc_auc[i]:.2f})')
+#     plt.plot([0, 1], [0, 1], 'k--')
+#     plt.xlim([0.0, 1.0])
+#     plt.ylim([0.0, 1.05])
+#     plt.xlabel('False Positive Rate')
+#     plt.ylabel('True Positive Rate')
+#     plt.title('Multiclass ROC Curve')
+#     plt.legend(loc="lower right")
+#     plt.show()
+# # Calculate the probabilities for each class
+# model_predictions = []
+# model_probabilities = []
+# model_probabilities = F.softmax(torch.tensor(model_predictions), dim=0).numpy()
+# plot_multiclass_roc_curve(y_test_binarized, model_probabilities, n_classes)
+# model.eval()  # Set the model to evaluation mode
+# with torch.no_grad():  # Turn off gradients for the following block
+#     for data, target in test_loader:
+#         data, target = data.to(device), target.to(device)
+#         output = model(data)
+#         # Get class predictions
+#         _, preds = torch.max(output, 1)
+#         model_predictions.extend(preds.cpu().numpy())
+#         # Get probabilities for the positive class
+#         probs = F.softmax(output, dim=1)[:, 1]  # Adjust the index based on your positive class
+#         model_probabilities.extend(probs.cpu().numpy())
+# # # Specificity = 		    Number of true negatives (Number of true negatives + number of false positives) =
+# # # = Total number of individuals without the illness
+# # def sensitivity_specificity(conf_matrix):
+# #     num_classes = conf_matrix.shape[0]
+# #     sensitivity = np.zeros(num_classes)
+# #     specificity = np.zeros(num_classes)
+# #     for i in range(num_classes):
+# #         TP = conf_matrix[i, i]
+# #         FN = sum(conf_matrix[i, :]) - TP
+# #         FP = sum(conf_matrix[:, i]) - TP
+# #         TN = conf_matrix.sum() - (TP + FP + FN)
+# #         sensitivity[i] = TP / (TP + FN) if (TP + FN) != 0 else 0
+# #         specificity[i] = TN / (TN + FP) if (TN + FP) != 0 else 0
+# #     return sensitivity, specificity
+# # from sklearn.preprocessing import label_binarize
+# # # Binarize the labels for multiclass (suggestion of LLM)
+# # y_test_binarized = label_binarize(y_test, classes=np.unique(y_test))
+# # #####################  R O C     C U R V E   #####################
+# # def plot_multiclass_roc_curve(y_true, y_scores, num_classes):
+# #     # Compute ROC curve and ROC area for each class
+# #     fpr = dict()
+# #     tpr = dict()
+# #     roc_auc = dict()
+# #     for i in range(num_classes):
+# #         fpr[i], tpr[i], _ = roc_curve(y_true[:, i], y_scores[:, i])
+# #         roc_auc[i] = auc(fpr[i], tpr[i])
+# #     # Plot all ROC curves
+# #     plt.figure()
+# #     for i in range(num_classes):
+# #         plt.plot(fpr[i], tpr[i], label=f'ROC curve of class {i} (area = {roc_auc[i]:.2f})')
+# #     plt.plot([0, 1], [0, 1], 'k--')
+# #     plt.xlim([0.0, 1.0])
+# #     plt.ylim([0.0, 1.05])
+# #     plt.xlabel('False Positive Rate')
+# #     plt.ylabel('True Positive Rate')
+# #     plt.title('Multiclass ROC Curve')
+# #     plt.legend(loc="lower right")
+# #     plt.show()
+# # # Calculate the probabilities for each class
+# # model_predictions = []
+# # model_probabilities = []
+# # model_probabilities = F.softmax(torch.tensor(model_predictions), dim=0).numpy()
+# # plot_multiclass_roc_curve(y_test_binarized, model_probabilities, n_classes)
+# # model.eval()  # Set the model to evaluation mode
+# # with torch.no_grad():  # Turn off gradients for the following block
+# #     for data, target in test_loader:
+# #         data, target = data.to(device), target.to(device)
+# #         output = model(data)
+# #         # Get class predictions
+# #         _, preds = torch.max(output, 1)
+# #         model_predictions.extend(preds.cpu().numpy())
+# #         # Get probabilities for the positive class
+# #         probs = F.softmax(output, dim=1)[:, 1]  # Adjust the index based on your positive class
+# #         model_probabilities.extend(probs.cpu().numpy())
+# # # Calculate sensitivity and specificity
+# # sensitivity, specificity = sensitivity_specificity(y_test, model_predictions)
+# # print(f"Sensitivity: {sensitivity}")
+# # print(f"Specificity: {specificity}")
+# # ##################################################################################################################################################################
+# # # # Display the images, 1 for each class
+# # # def display_images(images, titles, num_images):
+# # #     plt.figure(figsize=(15, 5))
+# # #     for i in range(num_images):
+# # #         image = np.squeeze(images[i]) # squeeze to make it easy to ptint in 2d
+# # #         plt.subplot(1, num_images, i + 1)
+# # #         plt.imshow(image, cmap='gray')
+# # #         plt.title(titles[i])
+# # #         plt.axis('off')
+# # #     plt.show()
+# # >>>>>>> ab59272 (Net / ResNet / EfficientNet  Experiments)
+# # # data_train = np.load(data_train_path)
+# # # # Data Verification to check if we all have everything good
+# # # data_shape = data_train.shape
+# # # data_type = data_train.dtype
+# # # labels_shape = y_train.shape
+# # # labels_type = y_train.dtype
+# # # print(f"Data Shape: {data_shape}, Data Type: {data_type}")
+# # # print(f"Labels Shape: {labels_shape}, Labels Type: {labels_type}")
+# # # # Check the range and distribution of features
+# # # data_range = (np.min(data_train), np.max(data_train))
+# # # # Label Encoding in accordance to the diseases
+# # # class_names_mapping = {
+# # #     0: 'Atelectasis',
+# # #     1: 'Effusion',
+# # #     2: 'Infiltration',
+# # #     3: 'No Finding',
+# # #     4: 'Nodule',
+# # #     5: 'Pneumonia'
+# # # }
+# # # print("Unique classes in the training set:")
+# # # for class_id in unique_labels:
+# # #     print(f"Class ID {class_id}: {class_names_mapping[class_id]}")
+# # # # df for distribution analysis
+# # # df_data_range = pd.DataFrame(data_train.reshape(data_train.shape[0], -1))
+# # # Calculate the probabilities for each class
+# # model_predictions = []
+# # model_probabilities = []
+# # model_probabilities = F.softmax(torch.tensor(model_predictions), dim=0).numpy()
+# # plot_multiclass_roc_curve(y_test_binarized, model_probabilities, n_classes)
+# # model.eval()  # Set the model to evaluation mode
+# # with torch.no_grad():  # Turn off gradients for the following block
+# #     for data, target in test_loader:
+# #         data, target = data.to(device), target.to(device)
+# #         output = model(data)
+# #         # Get class predictions
+# #         _, preds = torch.max(output, 1)
+# #         model_predictions.extend(preds.cpu().numpy())
+# #         # Get probabilities for the positive class
+# #         probs = F.softmax(output, dim=1)[:, 1]  # Adjust the index based on your positive class
+# #         model_probabilities.extend(probs.cpu().numpy())
+# # # Calculate sensitivity and specificity
+# # sensitivity, specificity = sensitivity_specificity(y_test, model_predictions)
+# # print(f"Sensitivity: {sensitivity}")
+# # print(f"Specificity: {specificity}")
+# # ##################################################################################################################################################################
+# # # # Display the images, 1 for each class
+# # # def display_images(images, titles, num_images):
+# # #     plt.figure(figsize=(15, 5))
+# # #     for i in range(num_images):
+# # #         image = np.squeeze(images[i]) # squeeze to make it easy to ptint in 2d
+# # #         plt.subplot(1, num_images, i + 1)
+# # #         plt.imshow(image, cmap='gray')
+# # #         plt.title(titles[i])
+# # #         plt.axis('off')
+# # #     plt.show()
+# # >>>>>>> ab59272 (Net / ResNet / EfficientNet  Experiments)
+# # # data_train = np.load(data_train_path)
+# # # # Data Verification to check if we all have everything good
+# # # data_shape = data_train.shape
+# # # data_type = data_train.dtype
+# # # labels_shape = y_train.shape
+# # # labels_type = y_train.dtype
+# # # print(f"Data Shape: {data_shape}, Data Type: {data_type}")
+# # # print(f"Labels Shape: {labels_shape}, Labels Type: {labels_type}")
+# # # # Check the range and distribution of features
+# # # data_range = (np.min(data_train), np.max(data_train))
+# # # # Label Encoding in accordance to the diseases
+# # # class_names_mapping = {
+# # #     0: 'Atelectasis',
+# # #     1: 'Effusion',
+# # #     2: 'Infiltration',
+# # #     3: 'No Finding',
+# # #     4: 'Nodule',
+# # #     5: 'Pneumonia'
+# # # }
+# # # print("Unique classes in the training set:")
+# # # for class_id in unique_labels:
+# # #     print(f"Class ID {class_id}: {class_names_mapping[class_id]}")
+# # # # df for distribution analysis
+# # # df_data_range = pd.DataFrame(data_train.reshape(data_train.shape[0], -1))

dc1/py.typed ADDED Viewed

File without changes

dc1/train_test.py ADDED Viewed

	@@ -0,0 +1,102 @@

+from tqdm import tqdm
+import torch
+from net import Net
+from batch_sampler import  BatchSampler
+from torch.nn import functional as F
+import numpy as np
+from net import Net
+from batch_sampler import  BatchSampler
+from torch.nn import functional as F
+import numpy as np
+import torch.nn as nn
+from net import Net, ResNetModel, EfficientNetModel, EfficientNetModel_b7
+from batch_sampler import BatchSampler
+from image_dataset import ImageDataset
+from typing import Callable, List, Tuple
+from sklearn.metrics import roc_curve, auc
+from sklearn.preprocessing import label_binarize
+import matplotlib.pyplot as plt
+def train_model(
+        # model: Net,  ## CHANGE NN HERE !
+        model: Net,
+        train_sampler: BatchSampler,
+        optimizer: torch.optim.Optimizer,
+        loss_function: Callable[..., torch.Tensor],
+        device: str,
+) -> List[torch.Tensor]:
+    # Lets keep track of all the losses:
+    losses = []
+    # Put the model in train mode:
+    model.train()
+    # Feed all the batches one by one:
+    for batch in tqdm(train_sampler):
+        # Get a batch:
+        x, y = batch
+        # Making sure our samples are stored on the same device as our model:
+        x, y = x.to(device), y.to(device)
+        # Get predictions:
+        predictions = model.forward(x)
+        loss = loss_function(predictions, y)
+        losses.append(loss)
+        # We first need to make sure we reset our optimizer at the start.
+        # We want to learn from each batch seperately,
+        # not from the entire dataset at once.
+        optimizer.zero_grad()
+        # We now backpropagate our loss through our model:
+        loss.backward()
+        # We then make the optimizer take a step in the right direction.
+        optimizer.step()
+    return losses
+def test_model(
+        model: Net,
+        test_sampler: BatchSampler,
+        loss_function: Callable[..., torch.Tensor],
+        device: str,
+        fpr,
+        tpr,
+        roc
+) -> Tuple[List[torch.Tensor], List[np.ndarray]]:
+    # Setting the model to evaluation mode:
+    model.eval()
+    losses = []
+    all_y_pred_probs = []
+    all_y_true = []
+    # We need to make sure we do not update our model based on the test data:
+    with torch.no_grad():
+        for (x, y) in tqdm(test_sampler):
+            # Making sure our samples are stored on the same device as our model:
+            x = x.to(device)
+            y = y.to(device)
+            prediction = model.forward(x)
+            loss = loss_function(prediction, y)
+            losses.append(loss)
+            probabilities = F.softmax(prediction, dim=1)
+            all_y_pred_probs.append(probabilities.cpu().numpy())
+            all_y_true.extend(y.cpu().numpy())
+    y_pred_probs = np.concatenate(all_y_pred_probs, axis=0)
+    y_true = np.array(all_y_true)
+    # NOTE: Comment this for loop and uncomment the # ROC for binary class for loop in order to see the correct Binary ROC curve.
+    # Compute ROC curve and ROC area for each class
+    for i in range(6):  # 6 classes
+        a, b, _ = roc_curve(y_true == i, y_pred_probs[:, i])
+        fpr[i].extend(a)
+        tpr[i].extend(b)
+        roc[i] = auc(fpr[i], tpr[i])
+    # # ROC for binary class
+    # for i in range(2):  # For binary classification
+    #     fpr[i], tpr[i], _ = roc_curve(y_true == i, y_pred_probs[:, i])
+    #     roc[i] = auc(fpr[i], tpr[i])
+    return losses, y_pred_probs

dc1/visualise_performance_metrics.py ADDED Viewed

	@@ -0,0 +1,24 @@

+from pathlib import Path
+import matplotlib.pyplot as plt
+from sklearn.metrics import confusion_matrix, ConfusionMatrixDisplay,classification_report
+from train_test import test_model
+import torch
+from net import Net
+from batch_sampler import BatchSampler
+from image_dataset import ImageDataset
+from sklearn.metrics import roc_curve, auc, RocCurveDisplay
+from sklearn.preprocessing import label_binarize
+from itertools import cycle
+# from scipy import interp
+import numpy as np
+from sklearn.metrics import roc_auc_score
+from sklearn.preprocessing import LabelBinarizer
+def create_confusion_matrix(true_labels, predicted_labels):
+    cm = confusion_matrix(true_labels, predicted_labels)
+    # Display it as a heatmap
+    disp = ConfusionMatrixDisplay(confusion_matrix=cm)
+    disp.plot(cmap=plt.cm.Blues)
+    plt.title('Confusion Matrix')
+    plt.show()

mypy.ini ADDED Viewed

	@@ -0,0 +1,2 @@


1	+ [mypy]
2	+ exclude = venv

requirements.txt ADDED Viewed

	@@ -0,0 +1,27 @@

+charset_normalizer==2.1.1
+certifi==2023.7.22
+colorama==0.4.6
+fonttools==4.43.0
+idna==3.4
+kiwisolver==1.4.4
+matplotlib==3.6.2
+mypy==0.991
+numpy==1.24.1
+packaging==22.0
+pillow==10.2.0
+plotext==5.2.8
+pyparsing==3.0.9
+python-dateutil==2.8.2
+requests==2.31.0
+six==1.16.0
+torch==1.13.1
+torchsummary==1.5.1
+tqdm==4.64.1
+types-requests==2.28.11.7
+types-setuptools==65.6.0.2
+types-tqdm==4.64.1
+urllib3==1.26.18
+torchvision == 0.17.1
+scikit-learn~=1.4.1.post1
+setuptools~=58.1.0

setup.py ADDED Viewed

	@@ -0,0 +1,17 @@

+from setuptools import setup
+from pathlib import Path
+with open(Path("requirements.txt"), "r") as requirements:
+    dependencies = requirements.readlines()
+setup(
+    name='Data-Challenge-1-template',
+    version='1.0.0',
+    packages=['dc1'],
+    url='',
+    license='',
+    author='',
+    author_email='',
+    description='',
+    install_requires=dependencies,
+)