LeiPricingManager commited on
Commit
de0e968
·
verified ·
1 Parent(s): 129c7f2

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +31 -0
README.md ADDED
@@ -0,0 +1,31 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # 03062025_V2_UMAP_Embedding_Classifier
2
+
3
+ This repository contains two final AutoGluon TabularPredictor models (binary and multi-class) built using UMAP-reduced embeddings from the [Alibaba-NLP/gte-large-en-v1.5](https://huggingface.co/Alibaba-NLP/gte-large-en-v1.5) model.
4
+
5
+ ## Key Details
6
+
7
+ - **UMAP for Binary Classification**: Best n_components tuned via Optuna = 11.
8
+ - **UMAP for Multi-class Classification**: Best n_components tuned via Optuna = 43.
9
+ - **Data**: 112 technical questions with tiering classifications (0–4).
10
+ - **Performance Metrics**:
11
+ - **Binary**: Accuracy ≈95.65%, F1 ≈0.97, ROC AUC ≈0.91.
12
+ - **Multi-class**: Accuracy ≈56.52%, F1 ≈0.59, ROC AUC ≈0.74.
13
+
14
+ ## Usage
15
+
16
+ 1. **Loading the Models**:
17
+ ```python
18
+ from autogluon.tabular import TabularPredictor
19
+ binary_predictor = TabularPredictor.load("binary_final_model")
20
+ multi_predictor = TabularPredictor.load("multiclass_final_model")
21
+ ```
22
+
23
+ 2. **Preprocessing**: Generate embeddings for your input text using the Alibaba-NLP/gte-large-en-v1.5 model and apply the UMAP transformation with the provided reducer files (umap_reducer_binary.joblib and umap_reducer_multi.joblib).
24
+
25
+ 3. **Prediction**: Use predict() and predict_proba() to obtain predictions.
26
+
27
+ ## License
28
+ This project is licensed under the Apache-2.0 License.
29
+
30
+ ## Contact
31
+ For questions or collaboration, please contact LeiPricingManager.