03062025_V2_UMAP_Embedding_Classifier

This repository contains two final AutoGluon TabularPredictor models (binary and multi-class) built using UMAP-reduced embeddings from the Alibaba-NLP/gte-large-en-v1.5 model.

Key Details

  • UMAP for Binary Classification: Best n_components tuned via Optuna = 11.
  • UMAP for Multi-class Classification: Best n_components tuned via Optuna = 43.
  • Data: 112 technical questions with tiering classifications (0–4).
  • Performance Metrics:
    • Binary: Accuracy β‰ˆ95.65%, F1 β‰ˆ0.97, ROC AUC β‰ˆ0.91.
    • Multi-class: Accuracy β‰ˆ56.52%, F1 β‰ˆ0.59, ROC AUC β‰ˆ0.74.

Usage

  1. Loading the Models:

    from autogluon.tabular import TabularPredictor
    binary_predictor = TabularPredictor.load("binary_final_model")
    multi_predictor = TabularPredictor.load("multiclass_final_model")
    
  2. Preprocessing: Generate embeddings for your input text using the Alibaba-NLP/gte-large-en-v1.5 model and apply the UMAP transformation with the provided reducer files (umap_reducer_binary.joblib and umap_reducer_multi.joblib).

  3. Prediction: Use predict() and predict_proba() to obtain predictions.

License

This project is licensed under the Apache-2.0 License.

Contact

For questions or collaboration, please contact LeiPricingManager.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The HF Inference API does not support text-classification models for autogluon library.

Evaluation results