metadata

language:
  - en
license: apache-2.0
library_name: autogluon
tags:
  - binary-classification
  - multi-class-classification
  - text-classification
  - embeddings
  - umap
  - autogluon
datasets:
  - 112_Tiering_Questions_02.28.2025.json
model-index:
  - name: 03062025_V2_UMAP_Embedding_Classifier (Binary)
    results:
      - task:
          type: text-classification
          name: Binary Classification
        dataset:
          name: 112_Tiering_Questions_02.28.2025.json
          type: tabular
        metrics:
          - name: Accuracy
            type: accuracy
            value: 0.9565
          - name: F1
            type: f1
            value: 0.97
          - name: ROC AUC
            type: roc_auc
            value: 0.91
  - name: 03062025_V2_UMAP_Embedding_Classifier (Multi-class)
    results:
      - task:
          type: text-classification
          name: Multi-class Classification
        dataset:
          name: 112_Tiering_Questions_02.28.2025.json
          type: tabular
        metrics:
          - name: Accuracy
            type: accuracy
            value: 0.5652
          - name: F1
            type: f1
            value: 0.59
          - name: ROC AUC
            type: roc_auc
            value: 0.74

03062025_V2_UMAP_Embedding_Classifier

This repository contains two final AutoGluon TabularPredictor models (binary and multi-class) built using UMAP-reduced embeddings from the Alibaba-NLP/gte-large-en-v1.5 model.

Key Details

UMAP for Binary Classification: Best n_components tuned via Optuna = 11.
UMAP for Multi-class Classification: Best n_components tuned via Optuna = 43.
Data: 112 technical questions with tiering classifications (0–4).
Performance Metrics:
- Binary: Accuracy ≈95.65%, F1 ≈0.97, ROC AUC ≈0.91.
- Multi-class: Accuracy ≈56.52%, F1 ≈0.59, ROC AUC ≈0.74.

Usage

Loading the Models:

from autogluon.tabular import TabularPredictor
binary_predictor = TabularPredictor.load("binary_final_model")
multi_predictor = TabularPredictor.load("multiclass_final_model")

Preprocessing: Generate embeddings for your input text using the Alibaba-NLP/gte-large-en-v1.5 model and apply the UMAP transformation with the provided reducer files (umap_reducer_binary.joblib and umap_reducer_multi.joblib).
Prediction: Use predict() and predict_proba() to obtain predictions.

License

This project is licensed under the Apache-2.0 License.

Contact

For questions or collaboration, please contact LeiPricingManager.