Upload README.md with huggingface_hub
Browse files
README.md
ADDED
@@ -0,0 +1,31 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# 03062025_V2_UMAP_Embedding_Classifier
|
2 |
+
|
3 |
+
This repository contains two final AutoGluon TabularPredictor models (binary and multi-class) built using UMAP-reduced embeddings from the [Alibaba-NLP/gte-large-en-v1.5](https://huggingface.co/Alibaba-NLP/gte-large-en-v1.5) model.
|
4 |
+
|
5 |
+
## Key Details
|
6 |
+
|
7 |
+
- **UMAP for Binary Classification**: Best n_components tuned via Optuna = 11.
|
8 |
+
- **UMAP for Multi-class Classification**: Best n_components tuned via Optuna = 43.
|
9 |
+
- **Data**: 112 technical questions with tiering classifications (0–4).
|
10 |
+
- **Performance Metrics**:
|
11 |
+
- **Binary**: Accuracy ≈95.65%, F1 ≈0.97, ROC AUC ≈0.91.
|
12 |
+
- **Multi-class**: Accuracy ≈56.52%, F1 ≈0.59, ROC AUC ≈0.74.
|
13 |
+
|
14 |
+
## Usage
|
15 |
+
|
16 |
+
1. **Loading the Models**:
|
17 |
+
```python
|
18 |
+
from autogluon.tabular import TabularPredictor
|
19 |
+
binary_predictor = TabularPredictor.load("binary_final_model")
|
20 |
+
multi_predictor = TabularPredictor.load("multiclass_final_model")
|
21 |
+
```
|
22 |
+
|
23 |
+
2. **Preprocessing**: Generate embeddings for your input text using the Alibaba-NLP/gte-large-en-v1.5 model and apply the UMAP transformation with the provided reducer files (umap_reducer_binary.joblib and umap_reducer_multi.joblib).
|
24 |
+
|
25 |
+
3. **Prediction**: Use predict() and predict_proba() to obtain predictions.
|
26 |
+
|
27 |
+
## License
|
28 |
+
This project is licensed under the Apache-2.0 License.
|
29 |
+
|
30 |
+
## Contact
|
31 |
+
For questions or collaboration, please contact LeiPricingManager.
|