LeiPricingManager's picture
Update README.md
9c40b2f verified
metadata
language:
  - en
license: apache-2.0
library_name: autogluon
tags:
  - binary-classification
  - multi-class-classification
  - text-classification
  - embeddings
  - umap
  - autogluon
datasets:
  - 112_Tiering_Questions_02.28.2025.json
model-index:
  - name: 03062025_V2_UMAP_Embedding_Classifier (Binary)
    results:
      - task:
          type: text-classification
          name: Binary Classification
        dataset:
          name: 112_Tiering_Questions_02.28.2025.json
          type: tabular
        metrics:
          - name: Accuracy
            type: accuracy
            value: 0.9565
          - name: F1
            type: f1
            value: 0.97
          - name: ROC AUC
            type: roc_auc
            value: 0.91
  - name: 03062025_V2_UMAP_Embedding_Classifier (Multi-class)
    results:
      - task:
          type: text-classification
          name: Multi-class Classification
        dataset:
          name: 112_Tiering_Questions_02.28.2025.json
          type: tabular
        metrics:
          - name: Accuracy
            type: accuracy
            value: 0.5652
          - name: F1
            type: f1
            value: 0.59
          - name: ROC AUC
            type: roc_auc
            value: 0.74

03062025_V2_UMAP_Embedding_Classifier

This repository contains two final AutoGluon TabularPredictor models (binary and multi-class) built using UMAP-reduced embeddings from the Alibaba-NLP/gte-large-en-v1.5 model.

Key Details

  • UMAP for Binary Classification: Best n_components tuned via Optuna = 11.
  • UMAP for Multi-class Classification: Best n_components tuned via Optuna = 43.
  • Data: 112 technical questions with tiering classifications (0–4).
  • Performance Metrics:
    • Binary: Accuracy β‰ˆ95.65%, F1 β‰ˆ0.97, ROC AUC β‰ˆ0.91.
    • Multi-class: Accuracy β‰ˆ56.52%, F1 β‰ˆ0.59, ROC AUC β‰ˆ0.74.

Usage

  1. Loading the Models:

    from autogluon.tabular import TabularPredictor
    binary_predictor = TabularPredictor.load("binary_final_model")
    multi_predictor = TabularPredictor.load("multiclass_final_model")
    
  2. Preprocessing: Generate embeddings for your input text using the Alibaba-NLP/gte-large-en-v1.5 model and apply the UMAP transformation with the provided reducer files (umap_reducer_binary.joblib and umap_reducer_multi.joblib).

  3. Prediction: Use predict() and predict_proba() to obtain predictions.

License

This project is licensed under the Apache-2.0 License.

Contact

For questions or collaboration, please contact LeiPricingManager.