CBIS-DDSM-CNN

CBIS-DDSM-CNN is a deep learning model based on a Convolutional Neural Network (CNN) designed to detect breast cancer from mammographic images. It was trained on the Curated Breast Imaging Subset of the DDSM (CBIS-DDSM) dataset, a widely used dataset in medical imaging research.

The model classifies mammograms into cancerous and non-cancerous categories, aiding in early detection and diagnosis.

Model Details

Model Description

Developed by: Lorenzo Maiuri
Funded by: No funds
Shared by: Lorenzo Maiuri
Model type: Image Classification
License: MIT

Model Sources

Repository: Hugging Face Model Repository
Dataset: CBIS-DDSM (Curated Breast Imaging Subset DDSM)
Dataset: Breast Histopathology Images
Kaggle Notebook: Link to Kaggle Notebook
Demo: Coming soon...

Uses

Try It Out

Coming soon...

Direct Use

from huggingface_hub import hf_hub_download
import tensorflow as tf
import cv2
import numpy as np
import json
import matplotlib.pyplot as plt

# Load model
repo_id = "maiurilorenzo/CBIS-DDSM-CNN"
model_path = hf_hub_download(repo_id=repo_id, filename="CNN_model.h5")
model = tf.keras.models.load_model(model_path)

# Load preprocessing info
preprocessing_path = hf_hub_download(repo_id=repo_id, filename="preprocessing.json")
with open(preprocessing_path, "r") as f:
    preprocessing_info = json.load(f)

# Define preprocessing function
def load_and_preprocess_image(image_path):
    try:
        img = cv2.imread(image_path, cv2.IMREAD_COLOR)
        if img is None:
            raise ValueError(f"Could not read image: {image_path}")
        
        img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
        img = cv2.resize(img, tuple(preprocessing_info["target_size"]), interpolation=cv2.INTER_AREA)
        img_array = img.astype(np.float32) / 255.0  

        return img_array
    except Exception as e:
        print(f"Error processing {image_path}: {str(e)}")
        return None

# Load and preprocess an example image
image_path = "/kaggle/input/miniddsm2/MINI-DDSM-Complete-JPEG-8/Benign/0029/C_0029_1.LEFT_CC.jpg"
img_array = load_and_preprocess_image(image_path)

if img_array is not None:
    img_batch = np.expand_dims(img_array, axis=0)
    predictions = model.predict(img_batch)

    cancer_probability = predictions[0][0]  # Assuming "Cancer" is the first class
    predicted_class = "Cancer" if cancer_probability >= 0.5 else "Normal"

    plt.imshow(img_array)
    plt.title(f'Predicted Class: {predicted_class}\nProbability of Cancer: {cancer_probability:.4f}')
    plt.axis('off')
    plt.show()
else:
    print("Image loading and preprocessing failed.")

Downstream Use

Medical Research: Can be used to assist in studying breast cancer detection techniques.
Computer-Aided Diagnosis (CAD) Systems: May serve as a component in automated screening tools (not for clinical use).
Model Benchmarking: Can serve as a baseline for transfer learning in medical imaging
Educational Purposes: Suitable for learning about deep learning applications in medical imaging.

Out-of-Scope Use

🚨 Not for clinical diagnosis! This model should not be used in real-world medical decision-making without further validation & regulatory approval. It is intended for research and educational purposes only.

Bias, Risks, and Limitations

Dataset Bias: The model is trained on Breast Histopathology Images, which may not fully represent all patient demographics.
False Positives/Negatives: Misclassification can occur, highlighting the need for human review in medical practice.
Limited Generalization: Performance may degrade on datasets from different imaging devices or institutions.
Ethical Concerns: AI in medical imaging should be deployed transparently and with clinical oversight to avoid unintended harm.

Recommendations

Pre-training on larger, diverse datasets: To improve generalization across different patient populations.
Explainability tools: Such as Grad-CAM or SHAP to help radiologists interpret predictions.
Continuous evaluation: With real-world clinical data before integration into healthcare systems

Training Details

Training Data

Dataset: Breast Histopathology Images
Image Types: High-resolution mammograms
Classes: Cancerous (Malignant), Non-Cancerous (Benign/Normal)
Annotations: Region of Interest (ROI) bounding boxes & BI-RADS assessments

Training Procedure

Model Architecture: CNN (4 Convolutional layers + BatchNorm + Dropout)
Loss Function: Categorical Cross-Entropy
Optimizer: Adam
Validation Split: 20%
Callbacks: Early Stopping, ReduceLROnPlateau

Preprocessing

Grayscale conversion for reduced complexity
Contrast enhancement for better lesion visibility
Image resizing to (50, 50) pixels
Normalization (scaling pixel values between 0 and 1)
Data augmentation (flipping, rotation, zooming) to improve generalization

Training Hyperparameters

Epochs: 20
Batch Size: 75
Learning Rate: 0.001
Optimizer: Adam
Dropout Rate: 0.4

Speeds, Sizes, Times

Total Training Time: 33m
Hardware Used: Tesla P100

Testing Data, Factors & Metrics

Testing Data

The model was evaluated on the test split of the CBIS-DDSM dataset

Metrics

The following metrics were computed for evaluation:

Accuracy
Confusion Matrix

Results

Accuracy: 0.9789

Summary

The model achieves strong performance on explicit misogyny detection, with potential for improvement in detecting more subtle or implicit forms of misogyny.

Environmental Impact

Carbon emissions can be estimated using the Machine Learning Impact calculator presented in Lacoste et al. (2019).

Hardware Type: Tesla P100
Hours used: 0.33
Cloud Provider: Kaggle
Carbon Emitted: 0.04

Citation

If you use this model, please cite it as follows:

@misc{CBIS-DDSM-CNN,
  author = {Lorenzo Maiuri},
  title = {CBIS-DDSM-CNN},
  year = {2025},
  publisher = {Hugging Face Hub},
  license = {MIT}
}