nasa-impact
/

science-keyword-classification

Text Classification

Model card Files Files and versions Community

SajilAwale commited on Oct 14, 2024

Commit

f5a8249

·

verified ·

1 Parent(s): da0596e

added info about dataset link

Files changed (1) hide show

README.md +2 -2

README.md CHANGED Viewed

@@ -15,7 +15,7 @@ We have fine-tuned [INDUS Model](https://huggingface.co/nasa-impact/nasa-smd-ibm
 - **Base Model:** INDUS, fine-tuned for multi-label classification.
 - **Loss Function:** The model uses focal loss instead of traditional cross-entropy to address label imbalance by focusing on difficult-to-classify examples.
-- **Dataset:** NASA's CMR metadata, filtered to remove duplicates and irrelevant labels, resulting in a training dataset of 42,474 records and 3,240 labels.
@@ -25,7 +25,7 @@ We have fine-tuned [INDUS Model](https://huggingface.co/nasa-impact/nasa-smd-ibm
 - **Stratified Splitting:** The dataset is split based on `provider-id` to maintain balanced representation across train, validation, and test sets.
 - **Improved Performance:** Focal loss with different focusing parameters (γ) was evaluated, showing significant improvements in weighted precision, recall, F1 score, and Jaccard similarity over cross-entropy loss and previous models.
-...
 ## Experiments
 1. **Baseline (alpha-1.0.1):** Used cross-entropy loss.

 - **Base Model:** INDUS, fine-tuned for multi-label classification.
 - **Loss Function:** The model uses focal loss instead of traditional cross-entropy to address label imbalance by focusing on difficult-to-classify examples.
+- **Dataset:** NASA's CMR metadata, filtered to remove duplicates and irrelevant labels, resulting in a dataset of 42,474 records and 3,240 labels. You can find the [dataset here](https://huggingface.co/datasets/nasa-impact/science-keyword-classification-dataset)
 - **Stratified Splitting:** The dataset is split based on `provider-id` to maintain balanced representation across train, validation, and test sets.
 - **Improved Performance:** Focal loss with different focusing parameters (γ) was evaluated, showing significant improvements in weighted precision, recall, F1 score, and Jaccard similarity over cross-entropy loss and previous models.
 ## Experiments
 1. **Baseline (alpha-1.0.1):** Used cross-entropy loss.