Update README.md
Browse files
    	
        README.md
    CHANGED
    
    | 
         @@ -2,34 +2,61 @@ 
     | 
|
| 2 | 
         
             
            license: apache-2.0
         
     | 
| 3 | 
         
             
            base_model: google/bigbird-roberta-base
         
     | 
| 4 | 
         
             
            tags:
         
     | 
| 5 | 
         
            -
            -  
     | 
| 6 | 
         
            -
             
     | 
| 7 | 
         
            -
             
     | 
| 8 | 
         
            -
             
     | 
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 9 | 
         
             
            ---
         
     | 
| 10 | 
         | 
| 11 | 
         
            -
            <!-- This model card has been generated automatically according to the information the Trainer had access to. You
         
     | 
| 12 | 
         
            -
            should probably proofread and complete it, then remove this comment. -->
         
     | 
| 13 | 
         | 
| 14 | 
         
             
            [<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights & Biases" width="200" height="32"/>](https://wandb.ai/pszemraj/eduscore-regression/runs/04oc07hx)
         
     | 
| 15 | 
         
            -
            # bigbird-roberta-base 
     | 
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 16 | 
         | 
| 17 | 
         
             
            This model is a fine-tuned version of [google/bigbird-roberta-base](https://huggingface.co/google/bigbird-roberta-base) on the HuggingFaceFW/fineweb-edu-llama3-annotations dataset.
         
     | 
| 18 | 
         
             
            It achieves the following results on the evaluation set:
         
     | 
| 19 | 
         
             
            - Loss: 0.2176
         
     | 
| 20 | 
         
             
            - Mse: 0.2176
         
     | 
| 21 | 
         | 
| 22 | 
         
            -
            ## Model description
         
     | 
| 23 | 
         
            -
             
     | 
| 24 | 
         
            -
            More information needed
         
     | 
| 25 | 
         
            -
             
     | 
| 26 | 
         
             
            ## Intended uses & limitations
         
     | 
| 27 | 
         | 
| 28 | 
         
            -
             
     | 
| 29 | 
         
            -
             
     | 
| 30 | 
         
            -
            ## Training and evaluation data
         
     | 
| 31 | 
         
            -
             
     | 
| 32 | 
         
            -
            More information needed
         
     | 
| 33 | 
         | 
| 34 | 
         
             
            ## Training procedure
         
     | 
| 35 | 
         | 
| 
         @@ -45,51 +72,4 @@ The following hyperparameters were used during training: 
     | 
|
| 45 | 
         
             
            - optimizer: Adam with betas=(0.9,0.98) and epsilon=1e-09
         
     | 
| 46 | 
         
             
            - lr_scheduler_type: linear
         
     | 
| 47 | 
         
             
            - lr_scheduler_warmup_ratio: 0.05
         
     | 
| 48 | 
         
            -
            - num_epochs: 1.0
         
     | 
| 49 | 
         
            -
             
     | 
| 50 | 
         
            -
            ### Training results
         
     | 
| 51 | 
         
            -
             
     | 
| 52 | 
         
            -
            | Training Loss | Epoch  | Step | Validation Loss | Mse    |
         
     | 
| 53 | 
         
            -
            |:-------------:|:------:|:----:|:---------------:|:------:|
         
     | 
| 54 | 
         
            -
            | 0.4763        | 0.0288 | 100  | 0.4468          | 0.4468 |
         
     | 
| 55 | 
         
            -
            | 0.3078        | 0.0577 | 200  | 0.3130          | 0.3130 |
         
     | 
| 56 | 
         
            -
            | 0.3088        | 0.0865 | 300  | 0.2695          | 0.2695 |
         
     | 
| 57 | 
         
            -
            | 0.2379        | 0.1153 | 400  | 0.2618          | 0.2618 |
         
     | 
| 58 | 
         
            -
            | 0.289         | 0.1441 | 500  | 0.2583          | 0.2583 |
         
     | 
| 59 | 
         
            -
            | 0.3049        | 0.1730 | 600  | 0.2723          | 0.2723 |
         
     | 
| 60 | 
         
            -
            | 0.2292        | 0.2018 | 700  | 0.2477          | 0.2477 |
         
     | 
| 61 | 
         
            -
            | 0.2677        | 0.2306 | 800  | 0.2369          | 0.2369 |
         
     | 
| 62 | 
         
            -
            | 0.3181        | 0.2594 | 900  | 0.2307          | 0.2307 |
         
     | 
| 63 | 
         
            -
            | 0.2551        | 0.2883 | 1000 | 0.2411          | 0.2411 |
         
     | 
| 64 | 
         
            -
            | 0.2743        | 0.3171 | 1100 | 0.2350          | 0.2350 |
         
     | 
| 65 | 
         
            -
            | 0.2383        | 0.3459 | 1200 | 0.2424          | 0.2424 |
         
     | 
| 66 | 
         
            -
            | 0.2191        | 0.3747 | 1300 | 0.2279          | 0.2279 |
         
     | 
| 67 | 
         
            -
            | 0.2431        | 0.4036 | 1400 | 0.2232          | 0.2232 |
         
     | 
| 68 | 
         
            -
            | 0.2161        | 0.4324 | 1500 | 0.2307          | 0.2307 |
         
     | 
| 69 | 
         
            -
            | 0.2459        | 0.4612 | 1600 | 0.2246          | 0.2246 |
         
     | 
| 70 | 
         
            -
            | 0.2403        | 0.4900 | 1700 | 0.2232          | 0.2232 |
         
     | 
| 71 | 
         
            -
            | 0.251         | 0.5189 | 1800 | 0.2421          | 0.2421 |
         
     | 
| 72 | 
         
            -
            | 0.2565        | 0.5477 | 1900 | 0.2207          | 0.2207 |
         
     | 
| 73 | 
         
            -
            | 0.2274        | 0.5765 | 2000 | 0.2294          | 0.2294 |
         
     | 
| 74 | 
         
            -
            | 0.2272        | 0.6053 | 2100 | 0.2192          | 0.2192 |
         
     | 
| 75 | 
         
            -
            | 0.2668        | 0.6342 | 2200 | 0.2204          | 0.2204 |
         
     | 
| 76 | 
         
            -
            | 0.2434        | 0.6630 | 2300 | 0.2196          | 0.2196 |
         
     | 
| 77 | 
         
            -
            | 0.2464        | 0.6918 | 2400 | 0.2185          | 0.2185 |
         
     | 
| 78 | 
         
            -
            | 0.2338        | 0.7206 | 2500 | 0.2166          | 0.2166 |
         
     | 
| 79 | 
         
            -
            | 0.243         | 0.7495 | 2600 | 0.2165          | 0.2165 |
         
     | 
| 80 | 
         
            -
            | 0.1891        | 0.7783 | 2700 | 0.2201          | 0.2201 |
         
     | 
| 81 | 
         
            -
            | 0.2355        | 0.8071 | 2800 | 0.2167          | 0.2167 |
         
     | 
| 82 | 
         
            -
            | 0.2231        | 0.8359 | 2900 | 0.2168          | 0.2168 |
         
     | 
| 83 | 
         
            -
            | 0.2274        | 0.8648 | 3000 | 0.2243          | 0.2243 |
         
     | 
| 84 | 
         
            -
            | 0.2287        | 0.8936 | 3100 | 0.2203          | 0.2203 |
         
     | 
| 85 | 
         
            -
            | 0.261         | 0.9224 | 3200 | 0.2186          | 0.2186 |
         
     | 
| 86 | 
         
            -
            | 0.2187        | 0.9512 | 3300 | 0.2176          | 0.2176 |
         
     | 
| 87 | 
         
            -
            | 0.2069        | 0.9801 | 3400 | 0.2178          | 0.2178 |
         
     | 
| 88 | 
         
            -
             
     | 
| 89 | 
         
            -
             
     | 
| 90 | 
         
            -
            ### Framework versions
         
     | 
| 91 | 
         
            -
             
     | 
| 92 | 
         
            -
            - Transformers 4.42.3
         
     | 
| 93 | 
         
            -
            - Pytorch 2.3.1+cu121
         
     | 
| 94 | 
         
            -
            - Datasets 2.20.0
         
     | 
| 95 | 
         
            -
            - Tokenizers 0.19.1
         
     | 
| 
         | 
|
| 2 | 
         
             
            license: apache-2.0
         
     | 
| 3 | 
         
             
            base_model: google/bigbird-roberta-base
         
     | 
| 4 | 
         
             
            tags:
         
     | 
| 5 | 
         
            +
            - eduscore
         
     | 
| 6 | 
         
            +
            - data filter
         
     | 
| 7 | 
         
            +
            inference: false
         
     | 
| 8 | 
         
            +
            datasets:
         
     | 
| 9 | 
         
            +
            - HuggingFaceFW/fineweb-edu-llama3-annotations
         
     | 
| 10 | 
         
            +
            language:
         
     | 
| 11 | 
         
            +
            - en
         
     | 
| 12 | 
         
             
            ---
         
     | 
| 13 | 
         | 
| 
         | 
|
| 
         | 
|
| 14 | 
         | 
| 15 | 
         
             
            [<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights & Biases" width="200" height="32"/>](https://wandb.ai/pszemraj/eduscore-regression/runs/04oc07hx)
         
     | 
| 16 | 
         
            +
            # bigbird-roberta-base: eduscore 
         
     | 
| 17 | 
         
            +
             
     | 
| 18 | 
         
            +
            Similar to the [original](https://hf.co/HuggingFaceFW/fineweb-edu-classifier), this model predicts a score of 0 to 5 on 'educational quality' of some text. This model was fine-tuned @ its max context length of 4096 tokens.
         
     | 
| 19 | 
         
            +
             
     | 
| 20 | 
         
            +
             
     | 
| 21 | 
         
            +
            ## Usage
         
     | 
| 22 | 
         
            +
             
     | 
| 23 | 
         
            +
            Note this is for CPU, for GPU you will need to make some (small) changes.
         
     | 
| 24 | 
         
            +
             
     | 
| 25 | 
         
            +
            ```py
         
     | 
| 26 | 
         
            +
            # Load model directly
         
     | 
| 27 | 
         
            +
            from transformers import AutoTokenizer, AutoModelForSequenceClassification
         
     | 
| 28 | 
         
            +
             
     | 
| 29 | 
         
            +
            model_name = "pszemraj/bigbird-roberta-base-edu-classifier"
         
     | 
| 30 | 
         
            +
            tokenizer = AutoTokenizer.from_pretrained(model_name)
         
     | 
| 31 | 
         
            +
            model = AutoModelForSequenceClassification.from_pretrained(
         
     | 
| 32 | 
         
            +
                model_name, attn_implementation="eager"
         
     | 
| 33 | 
         
            +
            )
         
     | 
| 34 | 
         
            +
             
     | 
| 35 | 
         
            +
            text = "This is a test sentence."
         
     | 
| 36 | 
         
            +
            inputs = tokenizer(text, return_tensors="pt", padding="longest", truncation=True)
         
     | 
| 37 | 
         
            +
            outputs = model(**inputs)
         
     | 
| 38 | 
         
            +
            logits = outputs.logits.squeeze(-1).float().detach().numpy()
         
     | 
| 39 | 
         
            +
            score = logits.item()
         
     | 
| 40 | 
         
            +
            result = {
         
     | 
| 41 | 
         
            +
                "text": text,
         
     | 
| 42 | 
         
            +
                "score": score,
         
     | 
| 43 | 
         
            +
                "int_score": int(round(max(0, min(score, 5)))),
         
     | 
| 44 | 
         
            +
            }
         
     | 
| 45 | 
         
            +
             
     | 
| 46 | 
         
            +
            print(result)
         
     | 
| 47 | 
         
            +
            # {'text': 'This is a test sentence.', 'score': 0.20170727372169495, 'int_score': 0}
         
     | 
| 48 | 
         
            +
            ```
         
     | 
| 49 | 
         
            +
             
     | 
| 50 | 
         
            +
            ## Details
         
     | 
| 51 | 
         | 
| 52 | 
         
             
            This model is a fine-tuned version of [google/bigbird-roberta-base](https://huggingface.co/google/bigbird-roberta-base) on the HuggingFaceFW/fineweb-edu-llama3-annotations dataset.
         
     | 
| 53 | 
         
             
            It achieves the following results on the evaluation set:
         
     | 
| 54 | 
         
             
            - Loss: 0.2176
         
     | 
| 55 | 
         
             
            - Mse: 0.2176
         
     | 
| 56 | 
         | 
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 57 | 
         
             
            ## Intended uses & limitations
         
     | 
| 58 | 
         | 
| 59 | 
         
            +
            Refer to the hf classifier's [model card](https://huggingface.co/HuggingFaceFW/fineweb-edu-classifier#limitations) for more details
         
     | 
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 60 | 
         | 
| 61 | 
         
             
            ## Training procedure
         
     | 
| 62 | 
         | 
| 
         | 
|
| 72 | 
         
             
            - optimizer: Adam with betas=(0.9,0.98) and epsilon=1e-09
         
     | 
| 73 | 
         
             
            - lr_scheduler_type: linear
         
     | 
| 74 | 
         
             
            - lr_scheduler_warmup_ratio: 0.05
         
     | 
| 75 | 
         
            +
            - num_epochs: 1.0
         
     | 
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         |