oopere commited on
Commit
024a466
·
verified ·
1 Parent(s): 698d650

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +15 -13
README.md CHANGED
@@ -5,7 +5,7 @@ metrics:
5
  - accuracy
6
  - perplexity
7
  base_model:
8
- - meta-llama/Llama-3.2-1B
9
  ---
10
 
11
  # Model Card for oopere/pruned20-llama-1b
@@ -31,20 +31,22 @@ This model is not intended to be used directly, but rather to be fine-tuned for
31
 
32
  | Benchmark | Original Model | Pruned Model | Relative Change |
33
  | ---- | ---- | ---- | ---- |
34
- | ARC-Easy | 65.19% | 53.03% | -18.7% |
35
- | BoolQ | 64.16% | 62.32% | -2.9% |
36
- | LAMBADA-OpenAI | 62.20% | 42.13% | -32.3% |
37
- | LAMBADA-Standard | 53.46% | 41.04% | -23.2% |
38
 
39
  ### Key Findings
40
- - Maintains strong performance on binary classification tasks (BoolQ)
41
- - Moderate degradation on reasoning tasks (ARC-Easy)
42
- - Significant impact on long-range comprehension (LAMBADA)
 
43
 
44
  ### Limitations
45
- - Reduced performance on tasks requiring complex language understanding
46
- - More significant degradation on tasks requiring long-range dependencies
47
- - May not be suitable for applications requiring high accuracy on language completion tasks
 
48
 
49
  ### Implementation Details
50
  - **Pruning Notebook:** [Detailed implementation and methodology](https://github.com/peremartra/Large-Language-Model-Notebooks-Course/blob/main/6-PRUNING/6_3_pruning_structured_llama3.2-1b_OK.ipynb)
@@ -52,13 +54,13 @@ This model is not intended to be used directly, but rather to be fine-tuned for
52
 
53
  ### Pruning Method
54
  - **Technique:** Structured pruning targeting MLP layers
55
- - **Pruning Ratio:** 20% of neurons removed from MLP layers
56
  - **Selection Criteria:** Importance scoring based on absolute maximum weights
57
  - **Architecture Specifics:** Maintained GLU structure during pruning
58
 
59
  ### Hardware Requirements
60
  - Reduced memory footprint compared to original model
61
- - Can run on hardware with ~20% less memory than original
62
 
63
  ## Acknowledgments
64
  - Thanks to [Mariusz Kurman](https://huggingface.co/mkurman) for creating [llama-pruning](https://github.com/MedITSolutionsKurman/llama-pruning), a library that extends and improve this pruning methodology.
 
5
  - accuracy
6
  - perplexity
7
  base_model:
8
+ - meta-llama/Llama-3.2-3B
9
  ---
10
 
11
  # Model Card for oopere/pruned20-llama-1b
 
31
 
32
  | Benchmark | Original Model | Pruned Model | Relative Change |
33
  | ---- | ---- | ---- | ---- |
34
+ | ARC-Easy | 65.19% | 60.69% | -6.9% |
35
+ | BoolQ | 64.16% | 51.22% | -20.2% |
36
+ | LAMBADA-OpenAI | 62.20% | 59.64% | -4.1% |
37
+ | LAMBADA-Standard | 53.46% | 54.61% | -2.2% |
38
 
39
  ### Key Findings
40
+ - Surprisingly, an improvement is observed on the LAMBADA-Standard benchmark, with a 2.2% relative increase in accuracy.
41
+ - Maintains competitive performance on binary classification tasks (BoolQ), with a 20.2% relative decrease in accuracy.
42
+ - Moderate degradation observed on reasoning tasks (ARC-Easy), with a 6.9% relative decrease in accuracy.
43
+ - Minimal impact on long-range comprehension (LAMBADA-OpenAI), with only a 4.1% relative decrease in accuracy.
44
 
45
  ### Limitations
46
+ - Reduced performance on tasks requiring complex reasoning, with moderate degradation observed on benchmarks like ARC-Easy.
47
+ - Noticeable decrease in accuracy on binary classification tasks, as seen in BoolQ.
48
+ - Mixed results on long-range dependencies, with minimal degradation on LAMBADA-OpenAI but variability across benchmarks.
49
+ - May not be suitable for applications requiring consistently high accuracy across diverse language tasks.
50
 
51
  ### Implementation Details
52
  - **Pruning Notebook:** [Detailed implementation and methodology](https://github.com/peremartra/Large-Language-Model-Notebooks-Course/blob/main/6-PRUNING/6_3_pruning_structured_llama3.2-1b_OK.ipynb)
 
54
 
55
  ### Pruning Method
56
  - **Technique:** Structured pruning targeting MLP layers
57
+ - **Pruning Ratio:** 10% of neurons removed from MLP layers
58
  - **Selection Criteria:** Importance scoring based on absolute maximum weights
59
  - **Architecture Specifics:** Maintained GLU structure during pruning
60
 
61
  ### Hardware Requirements
62
  - Reduced memory footprint compared to original model
63
+ - Can run on hardware with ~10% less memory than original
64
 
65
  ## Acknowledgments
66
  - Thanks to [Mariusz Kurman](https://huggingface.co/mkurman) for creating [llama-pruning](https://github.com/MedITSolutionsKurman/llama-pruning), a library that extends and improve this pruning methodology.