Text Classification
Transformers
Safetensors
English
roberta
SajilAwale commited on
Commit
f5a8249
·
verified ·
1 Parent(s): da0596e

added info about dataset link

Browse files
Files changed (1) hide show
  1. README.md +2 -2
README.md CHANGED
@@ -15,7 +15,7 @@ We have fine-tuned [INDUS Model](https://huggingface.co/nasa-impact/nasa-smd-ibm
15
 
16
  - **Base Model:** INDUS, fine-tuned for multi-label classification.
17
  - **Loss Function:** The model uses focal loss instead of traditional cross-entropy to address label imbalance by focusing on difficult-to-classify examples.
18
- - **Dataset:** NASA's CMR metadata, filtered to remove duplicates and irrelevant labels, resulting in a training dataset of 42,474 records and 3,240 labels.
19
 
20
 
21
 
@@ -25,7 +25,7 @@ We have fine-tuned [INDUS Model](https://huggingface.co/nasa-impact/nasa-smd-ibm
25
  - **Stratified Splitting:** The dataset is split based on `provider-id` to maintain balanced representation across train, validation, and test sets.
26
  - **Improved Performance:** Focal loss with different focusing parameters (γ) was evaluated, showing significant improvements in weighted precision, recall, F1 score, and Jaccard similarity over cross-entropy loss and previous models.
27
 
28
- ...
29
  ## Experiments
30
 
31
  1. **Baseline (alpha-1.0.1):** Used cross-entropy loss.
 
15
 
16
  - **Base Model:** INDUS, fine-tuned for multi-label classification.
17
  - **Loss Function:** The model uses focal loss instead of traditional cross-entropy to address label imbalance by focusing on difficult-to-classify examples.
18
+ - **Dataset:** NASA's CMR metadata, filtered to remove duplicates and irrelevant labels, resulting in a dataset of 42,474 records and 3,240 labels. You can find the [dataset here](https://huggingface.co/datasets/nasa-impact/science-keyword-classification-dataset)
19
 
20
 
21
 
 
25
  - **Stratified Splitting:** The dataset is split based on `provider-id` to maintain balanced representation across train, validation, and test sets.
26
  - **Improved Performance:** Focal loss with different focusing parameters (γ) was evaluated, showing significant improvements in weighted precision, recall, F1 score, and Jaccard similarity over cross-entropy loss and previous models.
27
 
28
+
29
  ## Experiments
30
 
31
  1. **Baseline (alpha-1.0.1):** Used cross-entropy loss.