CBIS-DDSM-CNN
CBIS-DDSM-CNN is a deep learning model based on a Convolutional Neural Network (CNN) designed to detect breast cancer from mammographic images. It was trained on the Curated Breast Imaging Subset of the DDSM (CBIS-DDSM) dataset, a widely used dataset in medical imaging research.
The model classifies mammograms into cancerous and non-cancerous categories, aiding in early detection and diagnosis.
Model Details
Model Description
Model Description
- Developed by: Lorenzo Maiuri
- Funded by: No funds
- Shared by: Lorenzo Maiuri
- Model type: Image Classification
- License: MIT
Model Sources
- Repository: Hugging Face Model Repository
- Dataset: CBIS-DDSM (Curated Breast Imaging Subset DDSM)
- Dataset: Breast Histopathology Images
- Kaggle Notebook: Link to Kaggle Notebook
- Demo: Coming soon...
Uses
Try It Out
Coming soon...
Direct Use
from huggingface_hub import hf_hub_download
import tensorflow as tf
import cv2
import numpy as np
import json
import matplotlib.pyplot as plt
# Load model
repo_id = "maiurilorenzo/CBIS-DDSM-CNN"
model_path = hf_hub_download(repo_id=repo_id, filename="CNN_model.h5")
model = tf.keras.models.load_model(model_path)
# Load preprocessing info
preprocessing_path = hf_hub_download(repo_id=repo_id, filename="preprocessing.json")
with open(preprocessing_path, "r") as f:
preprocessing_info = json.load(f)
# Define preprocessing function
def load_and_preprocess_image(image_path):
try:
img = cv2.imread(image_path, cv2.IMREAD_COLOR)
if img is None:
raise ValueError(f"Could not read image: {image_path}")
img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
img = cv2.resize(img, tuple(preprocessing_info["target_size"]), interpolation=cv2.INTER_AREA)
img_array = img.astype(np.float32) / 255.0
return img_array
except Exception as e:
print(f"Error processing {image_path}: {str(e)}")
return None
# Load and preprocess an example image
image_path = "/kaggle/input/miniddsm2/MINI-DDSM-Complete-JPEG-8/Benign/0029/C_0029_1.LEFT_CC.jpg"
img_array = load_and_preprocess_image(image_path)
if img_array is not None:
img_batch = np.expand_dims(img_array, axis=0)
predictions = model.predict(img_batch)
cancer_probability = predictions[0][0] # Assuming "Cancer" is the first class
predicted_class = "Cancer" if cancer_probability >= 0.5 else "Normal"
plt.imshow(img_array)
plt.title(f'Predicted Class: {predicted_class}\nProbability of Cancer: {cancer_probability:.4f}')
plt.axis('off')
plt.show()
else:
print("Image loading and preprocessing failed.")
Downstream Use
- Medical Research: Can be used to assist in studying breast cancer detection techniques.
- Computer-Aided Diagnosis (CAD) Systems: May serve as a component in automated screening tools (not for clinical use).
- Model Benchmarking: Can serve as a baseline for transfer learning in medical imaging
- Educational Purposes: Suitable for learning about deep learning applications in medical imaging.
Out-of-Scope Use
🚨 Not for clinical diagnosis! This model should not be used in real-world medical decision-making without further validation & regulatory approval. It is intended for research and educational purposes only.
Bias, Risks, and Limitations
- Dataset Bias: The model is trained on Breast Histopathology Images, which may not fully represent all patient demographics.
- False Positives/Negatives: Misclassification can occur, highlighting the need for human review in medical practice.
- Limited Generalization: Performance may degrade on datasets from different imaging devices or institutions.
- Ethical Concerns: AI in medical imaging should be deployed transparently and with clinical oversight to avoid unintended harm.
Recommendations
- Pre-training on larger, diverse datasets: To improve generalization across different patient populations.
- Explainability tools: Such as Grad-CAM or SHAP to help radiologists interpret predictions.
- Continuous evaluation: With real-world clinical data before integration into healthcare systems
Training Details
Training Data
- Dataset: Breast Histopathology Images
- Image Types: High-resolution mammograms
- Classes: Cancerous (Malignant), Non-Cancerous (Benign/Normal)
- Annotations: Region of Interest (ROI) bounding boxes & BI-RADS assessments
Training Procedure
- Model Architecture: CNN (4 Convolutional layers + BatchNorm + Dropout)
- Loss Function: Categorical Cross-Entropy
- Optimizer: Adam
- Validation Split: 20%
- Callbacks: Early Stopping, ReduceLROnPlateau
Preprocessing
- Grayscale conversion for reduced complexity
- Contrast enhancement for better lesion visibility
- Image resizing to (50, 50) pixels
- Normalization (scaling pixel values between 0 and 1)
- Data augmentation (flipping, rotation, zooming) to improve generalization
Training Hyperparameters
- Epochs: 20
- Batch Size: 75
- Learning Rate: 0.001
- Optimizer: Adam
- Dropout Rate: 0.4
Speeds, Sizes, Times
- Total Training Time: 33m
- Hardware Used: Tesla P100
Testing Data, Factors & Metrics
Testing Data
The model was evaluated on the test split of the CBIS-DDSM dataset
Metrics
The following metrics were computed for evaluation:
- Accuracy
- Confusion Matrix
Results
- Accuracy: 0.9789
Summary
The model achieves strong performance on explicit misogyny detection, with potential for improvement in detecting more subtle or implicit forms of misogyny.
Environmental Impact
Carbon emissions can be estimated using the Machine Learning Impact calculator presented in Lacoste et al. (2019).
- Hardware Type: Tesla P100
- Hours used: 0.33
- Cloud Provider: Kaggle
- Carbon Emitted: 0.04
Citation
If you use this model, please cite it as follows:
@misc{CBIS-DDSM-CNN,
author = {Lorenzo Maiuri},
title = {CBIS-DDSM-CNN},
year = {2025},
publisher = {Hugging Face Hub},
license = {MIT}
}
- Downloads last month
- 0