oopere
/

pruned10-llama-3.2-3B

Text Generation

Transformers

Safetensors

llama

text-generation-inference

Model card Files Files and versions Community

oopere commited on Dec 22, 2024

Commit

024a466

verified ·

1 Parent(s): 698d650

Update README.md

Browse files

Files changed (1) hide show

README.md +15 -13

README.md CHANGED Viewed

@@ -5,7 +5,7 @@ metrics:
 - accuracy
 - perplexity
 base_model:
-- meta-llama/Llama-3.2-1B
 ---
 # Model Card for oopere/pruned20-llama-1b
@@ -31,20 +31,22 @@ This model is not intended to be used directly, but rather to be fine-tuned for
 | Benchmark | Original Model | Pruned Model | Relative Change |
 | ---- | ---- | ---- | ---- |
-| ARC-Easy | 65.19% | 53.03% | -18.7% |
-| BoolQ | 64.16% | 62.32% | -2.9% |
-| LAMBADA-OpenAI | 62.20% | 42.13% | -32.3% |
-| LAMBADA-Standard | 53.46% | 41.04% | -23.2% |
 ### Key Findings
-- Maintains strong performance on binary classification tasks (BoolQ)
-- Moderate degradation on reasoning tasks (ARC-Easy)
-- Significant impact on long-range comprehension (LAMBADA)
 ### Limitations
-- Reduced performance on tasks requiring complex language understanding
-- More significant degradation on tasks requiring long-range dependencies
-- May not be suitable for applications requiring high accuracy on language completion tasks
 ### Implementation Details
 - **Pruning Notebook:** [Detailed implementation and methodology](https://github.com/peremartra/Large-Language-Model-Notebooks-Course/blob/main/6-PRUNING/6_3_pruning_structured_llama3.2-1b_OK.ipynb)
@@ -52,13 +54,13 @@ This model is not intended to be used directly, but rather to be fine-tuned for
 ### Pruning Method
 - **Technique:** Structured pruning targeting MLP layers
-- **Pruning Ratio:** 20% of neurons removed from MLP layers
 - **Selection Criteria:** Importance scoring based on absolute maximum weights
 - **Architecture Specifics:** Maintained GLU structure during pruning
 ### Hardware Requirements
 - Reduced memory footprint compared to original model
-- Can run on hardware with ~20% less memory than original
 ## Acknowledgments
 - Thanks to [Mariusz Kurman](https://huggingface.co/mkurman) for creating [llama-pruning](https://github.com/MedITSolutionsKurman/llama-pruning), a library that extends and improve this pruning methodology.

 - accuracy
 - perplexity
 base_model:
+- meta-llama/Llama-3.2-3B
 ---
 # Model Card for oopere/pruned20-llama-1b
 | Benchmark | Original Model | Pruned Model | Relative Change |
 | ---- | ---- | ---- | ---- |
+| ARC-Easy | 65.19% | 60.69% | -6.9% |
+| BoolQ | 64.16% | 51.22% | -20.2% |
+| LAMBADA-OpenAI | 62.20% | 59.64% | -4.1% |
+| LAMBADA-Standard | 53.46% | 54.61% | -2.2% |
 ### Key Findings
+- Surprisingly, an improvement is observed on the LAMBADA-Standard benchmark, with a 2.2% relative increase in accuracy.
+- Maintains competitive performance on binary classification tasks (BoolQ), with a 20.2% relative decrease in accuracy.
+- Moderate degradation observed on reasoning tasks (ARC-Easy), with a 6.9% relative decrease in accuracy.
+- Minimal impact on long-range comprehension (LAMBADA-OpenAI), with only a 4.1% relative decrease in accuracy.
 ### Limitations
+- Reduced performance on tasks requiring complex reasoning, with moderate degradation observed on benchmarks like ARC-Easy.
+- Noticeable decrease in accuracy on binary classification tasks, as seen in BoolQ.
+- Mixed results on long-range dependencies, with minimal degradation on LAMBADA-OpenAI but variability across benchmarks.
+- May not be suitable for applications requiring consistently high accuracy across diverse language tasks.
 ### Implementation Details
 - **Pruning Notebook:** [Detailed implementation and methodology](https://github.com/peremartra/Large-Language-Model-Notebooks-Course/blob/main/6-PRUNING/6_3_pruning_structured_llama3.2-1b_OK.ipynb)
 ### Pruning Method
 - **Technique:** Structured pruning targeting MLP layers
+- **Pruning Ratio:** 10% of neurons removed from MLP layers
 - **Selection Criteria:** Importance scoring based on absolute maximum weights
 - **Architecture Specifics:** Maintained GLU structure during pruning
 ### Hardware Requirements
 - Reduced memory footprint compared to original model
+- Can run on hardware with ~10% less memory than original
 ## Acknowledgments
 - Thanks to [Mariusz Kurman](https://huggingface.co/mkurman) for creating [llama-pruning](https://github.com/MedITSolutionsKurman/llama-pruning), a library that extends and improve this pruning methodology.