added info about dataset link
Browse files
README.md
CHANGED
@@ -15,7 +15,7 @@ We have fine-tuned [INDUS Model](https://huggingface.co/nasa-impact/nasa-smd-ibm
|
|
15 |
|
16 |
- **Base Model:** INDUS, fine-tuned for multi-label classification.
|
17 |
- **Loss Function:** The model uses focal loss instead of traditional cross-entropy to address label imbalance by focusing on difficult-to-classify examples.
|
18 |
-
- **Dataset:** NASA's CMR metadata, filtered to remove duplicates and irrelevant labels, resulting in a
|
19 |
|
20 |
|
21 |
|
@@ -25,7 +25,7 @@ We have fine-tuned [INDUS Model](https://huggingface.co/nasa-impact/nasa-smd-ibm
|
|
25 |
- **Stratified Splitting:** The dataset is split based on `provider-id` to maintain balanced representation across train, validation, and test sets.
|
26 |
- **Improved Performance:** Focal loss with different focusing parameters (γ) was evaluated, showing significant improvements in weighted precision, recall, F1 score, and Jaccard similarity over cross-entropy loss and previous models.
|
27 |
|
28 |
-
|
29 |
## Experiments
|
30 |
|
31 |
1. **Baseline (alpha-1.0.1):** Used cross-entropy loss.
|
|
|
15 |
|
16 |
- **Base Model:** INDUS, fine-tuned for multi-label classification.
|
17 |
- **Loss Function:** The model uses focal loss instead of traditional cross-entropy to address label imbalance by focusing on difficult-to-classify examples.
|
18 |
+
- **Dataset:** NASA's CMR metadata, filtered to remove duplicates and irrelevant labels, resulting in a dataset of 42,474 records and 3,240 labels. You can find the [dataset here](https://huggingface.co/datasets/nasa-impact/science-keyword-classification-dataset)
|
19 |
|
20 |
|
21 |
|
|
|
25 |
- **Stratified Splitting:** The dataset is split based on `provider-id` to maintain balanced representation across train, validation, and test sets.
|
26 |
- **Improved Performance:** Focal loss with different focusing parameters (γ) was evaluated, showing significant improvements in weighted precision, recall, F1 score, and Jaccard similarity over cross-entropy loss and previous models.
|
27 |
|
28 |
+
|
29 |
## Experiments
|
30 |
|
31 |
1. **Baseline (alpha-1.0.1):** Used cross-entropy loss.
|