You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

πŸ” Gaussian Mixture Model (GMM) for Imbalanced Classification

This project implements a Gaussian Mixture Model (GMM)-based classifier designed to handle extremely imbalanced classification problems. It simulates real-world imbalance scenarios and benchmarks against 3 public datasets.


🧠 Problem Statement

Many real-world classification tasks (e.g., fraud detection, rare disease diagnosis) suffer from minority class scarcity. Classical ML methods often fail due to biased decision boundaries.

This project demonstrates how GMM-based generative classifiers, when combined with intelligent imbalance handling (e.g., undersampling), can improve minority class detection β€” especially in low-data regimes.


πŸ§ͺ Datasets Used

  1. Breast Cancer Wisconsin Dataset (sklearn.datasets.load_breast_cancer)
  2. Credit Card Fraud Detection (OpenML 42175)
  3. Adult Income Dataset (OpenML 1590)

πŸ“Š Key Features

  • πŸ” GMM classifier per class
  • βš–οΈ Controlled imbalance sampling
  • πŸ“Š Evaluation: F1-macro, balanced accuracy
  • πŸ§ͺ Multi-dataset benchmark
  • πŸš€ Hugging Face integration for model sharing

πŸš€ Usage

πŸ”§ Install dependencies

pip install -r requirements.txt

▢️ Run benchmark

python benchmark.py

πŸ“ˆ Output

  • Classification report
  • Confusion matrix
  • Balanced accuracy
  • F1-score (macro)

πŸ“‚ Project Structure

gmm-minority-classification/
β”œβ”€β”€ gmm_classifier.py        # GMM model logic
β”œβ”€β”€ data_loader.py           # Dataset loaders (3 total)
β”œβ”€β”€ imbalance_sampler.py     # Undersampling function
β”œβ”€β”€ benchmark.py             # Multi-dataset test harness
β”œβ”€β”€ evaluate.py              # Metric evaluation functions
β”œβ”€β”€ push_to_huggingface.py   # Upload model to HF hub
β”œβ”€β”€ requirements.txt
β”œβ”€β”€ README.md
└── .gitignore

πŸ“š Research & Citations

We built this based on the following key research works:

GMM & Probabilistic Models

  1. Dempster et al. (1977) β€” Maximum likelihood via EM
  2. Bishop, C. (2006) β€” Pattern Recognition and Machine Learning
  3. McLachlan & Peel (2000) β€” Finite Mixture Models
  4. Reynolds et al. (2009) β€” Gaussian Mixture Modeling for Classification
  5. Bouveyron et al. (2007) β€” High-dimensional GMM classification

Imbalanced Classification

  1. Chawla et al. (2002) β€” SMOTE
  2. He & Garcia (2009) β€” Learning from Imbalanced Data
  3. Japkowicz (2000) β€” The Class Imbalance Problem: A Historical Perspective
  4. Buda et al. (2018) β€” A systematic study of class imbalance
  5. Liu et al. (2009) β€” EasyEnsemble and BalanceCascade

Evaluation Metrics

  1. Sokolova & Lapalme (2009) β€” A systematic analysis of performance measures
  2. Van Rijsbergen (1979) β€” Information Retrieval (F-measure origin)

Dataset Papers

  1. Dua & Graff (2019) β€” UCI Machine Learning Repository
  2. Lichman (2013) β€” Adult Dataset
  3. Dal Pozzolo et al. (2015) β€” Credit Card Fraud Dataset

Recent Works & Variants

  1. Loquercio et al. (2020) β€” Generative Models for Anomaly Detection
  2. Roy et al. (2022) β€” GMM on Tabular Data
  3. Fuchs et al. (2023) β€” Robust GMM Variants
  4. Ren et al. (2023) β€” Mixture of Experts for Class Imbalance
  5. Guo et al. (2021) β€” Bayesian GMMs in Skewed Data
  6. Cao et al. (2021) β€” Confidence-aware GMMs
  7. Wang et al. (2023) β€” Deep Mixture Models for Rare Class Learning
  8. Han et al. (2022) β€” Label Noise and GMM
  9. Kim et al. (2022) β€” Hybrid GMM for Multi-Class Tabular Data
  10. Cortes et al. (2025) β€” Margin-aware Mixture Models

πŸ€— Push to Hugging Face

To publish the trained GMM:

huggingface-cli login
python push_to_huggingface.py

You can also use:

huggingface-cli repo create gmm-imbalance-model --type=model

πŸ™Œ Authors

Saurav Singla
πŸ“¬ github.com/sauravsingla



πŸ“¦ Pretrained GMM Model

We provide a pretrained Gaussian Mixture Model as gmm_pretrained_model.pkl inside this repository.

πŸ”§ Load Model in Python

from joblib import load

# Load from local file
model_bundle = load("gmm_pretrained_model.pkl")
scaler = model_bundle["scaler"]
model_0 = model_bundle["model_0"]
model_1 = model_bundle["model_1"]

# Predict
X_scaled = scaler.transform(X_new)
score_0 = model_0.score_samples(X_scaled)
score_1 = model_1.score_samples(X_scaled)
y_pred = (score_1 > score_0).astype(int)

🌐 Load from Hugging Face

from huggingface_hub import hf_hub_download
model_path = hf_hub_download(repo_id="YOUR_USERNAME/gmm-imbalance-model", filename="gmm_pretrained_model.pkl")
model_bundle = load(model_path)

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support