03062025_V2_UMAP_Embedding_Classifier
This repository contains two final AutoGluon TabularPredictor models (binary and multi-class) built using UMAP-reduced embeddings from the Alibaba-NLP/gte-large-en-v1.5 model.
Key Details
- UMAP for Binary Classification: Best n_components tuned via Optuna = 11.
- UMAP for Multi-class Classification: Best n_components tuned via Optuna = 43.
- Data: 112 technical questions with tiering classifications (0β4).
- Performance Metrics:
- Binary: Accuracy β95.65%, F1 β0.97, ROC AUC β0.91.
- Multi-class: Accuracy β56.52%, F1 β0.59, ROC AUC β0.74.
Usage
Loading the Models:
from autogluon.tabular import TabularPredictor binary_predictor = TabularPredictor.load("binary_final_model") multi_predictor = TabularPredictor.load("multiclass_final_model")
Preprocessing: Generate embeddings for your input text using the Alibaba-NLP/gte-large-en-v1.5 model and apply the UMAP transformation with the provided reducer files (umap_reducer_binary.joblib and umap_reducer_multi.joblib).
Prediction: Use predict() and predict_proba() to obtain predictions.
License
This project is licensed under the Apache-2.0 License.
Contact
For questions or collaboration, please contact LeiPricingManager.
Inference Providers
NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API:
The HF Inference API does not support text-classification models for autogluon library.
Evaluation results
- Accuracy on 112_Tiering_Questions_02.28.2025.jsonself-reported0.957
- F1 on 112_Tiering_Questions_02.28.2025.jsonself-reported0.970
- ROC AUC on 112_Tiering_Questions_02.28.2025.jsonself-reported0.910
- Accuracy on 112_Tiering_Questions_02.28.2025.jsonself-reported0.565
- F1 on 112_Tiering_Questions_02.28.2025.jsonself-reported0.590
- ROC AUC on 112_Tiering_Questions_02.28.2025.jsonself-reported0.740