|
--- |
|
language: |
|
- en |
|
- de |
|
- fr |
|
- it |
|
- pt |
|
- hi |
|
- es |
|
- th |
|
library_name: transformers |
|
pipeline_tag: text-generation |
|
tags: |
|
- facebook |
|
- meta |
|
- pytorch |
|
- llama |
|
- llama-3 |
|
license: llama3.2 |
|
--- |
|
|
|
# Evolution Learning Network (ELN) with QLoRA and Genetic Algorithms For LLM |
|
|
|
## Overview |
|
|
|
This project implements an **Evolution Learning Network (ELN)** to fine-tune transformer-based models like LLaMA using a combination of **Quantized Low-Rank Adaptation (QLoRA)** and **Genetic Algorithms (GA)**. The primary objective is to evolve a population of models across multiple generations to optimize for performance (fitness) and specialization, while maintaining diversity. |
|
|
|
### Key Features |
|
- Efficient model fine-tuning using **QLoRA** with 4-bit quantization |
|
- Evolutionary strategies with tournament selection and blended crossover |
|
- Adaptive mutation rates based on generation progress |
|
- Comprehensive experiment tracking with **WandB** |
|
- Diversity maintenance through LoRA weight fingerprinting |
|
|
|
## Model Details |
|
|
|
### Base Model |
|
- **Name**: [meta-llama/Llama-3.2-1B](https://huggingface.co/meta-llama/Llama-3.2-1B) |
|
- **Architecture**: Transformer-based causal language model |
|
|
|
### Quantization Configuration |
|
- **Type**: 4-bit quantization using `bitsandbytes` |
|
- **Parameters**: |
|
- Compute Type: `torch.float16` |
|
- Quantization Type: `"nf4"` (Nonlinear) |
|
- Double Quantization: Enabled |
|
- Nested Quantization: Enabled |
|
|
|
### LoRA Configuration |
|
- **Rank (r)**: 8 |
|
- **Alpha**: 16 |
|
- **Target Modules**: `q_proj`, `v_proj` |
|
- **Dropout**: 0.05 |
|
- **Task Type**: `CAUSAL_LM` |
|
|
|
### Training Configuration |
|
- **Optimizer**: `paged_adamw_8bit` |
|
- **Precision**: Mixed precision (`fp16`) |
|
- **Batch Size Range**: 2-16 (genome-controlled) |
|
- **Learning Rate Range**: 1e-6 to 1e-2 (genome-controlled) |
|
- **Epochs Range**: 1-4 (genome-controlled) |
|
|
|
## Dataset |
|
|
|
### Source |
|
- **Name**: WikiText-2 Raw |
|
- **Configuration**: `wikitext-2-raw-v1` |
|
- **Processing**: |
|
- Max Length: 128 tokens |
|
- Padding: Fixed to max length |
|
- Splits: train, validation (general), test (specific) |
|
|
|
## Evolution Process |
|
|
|
### Population Management |
|
1. **Initialization**: |
|
- Population Size: 6 models |
|
- Initial random mutations (20% rate) |
|
- Randomized hyperparameter genomes |
|
|
|
2. **Selection & Evolution**: |
|
- Tournament selection (k=3) |
|
- Blended crossover of LoRA weights |
|
- Adaptive mutation rates (decreases with generations) |
|
- Hyperparameter mutation with controlled ranges |
|
|
|
## Experimental Results |
|
|
|
### Evolution Progress |
|
|
|
The evolutionary learning process was run for 8 generations with a population size of 6 models. The experiment tracked several key metrics across generations: |
|
|
|
Evolution Metrics |
|
<div style="display: flex;"> |
|
<img src="https://huggingface.co/diabolic6045/ELN-llama-1B-adapter/resolve/main/images/output.png" alt="Evolution Metrics" style="width: 50%;height: 50%;"/></div> |
|
|
|
#### Fitness Progression |
|
- **Initial Performance**: Best fitness started at ~0.480 (Generation 1) |
|
- **Convergence**: Gradual decline to ~0.476 by Generation 8 |
|
- **Population Stability**: Average fitness closely tracked best fitness after Generation 2, indicating good convergence |
|
- **Fitness Range**: Maintained between 0.476-0.480 throughout evolution |
|
|
|
#### Specialization Trends |
|
- **High Baseline**: Started at ~0.9975 specialization |
|
- **Consistency**: Fluctuated minimally between 0.9975-0.9990 |
|
- **Peak Performance**: Reached ~0.9991 specialization in Generation 6 |
|
- **Population Average**: Maintained above 0.997 throughout evolution |
|
|
|
### Comparison with Standard Training |
|
|
|
 |
|
|
|
The comparison reveals several key differences between ELN and standard training: |
|
|
|
#### Fitness Metrics |
|
- **ELN**: 0.4762 final fitness with stable progression |
|
- **Standard**: 0.4779 final fitness with steeper learning curve |
|
- **Difference**: ~0.3% performance gap, favoring standard training |
|
|
|
#### Training Characteristics |
|
- **Loss Reduction**: |
|
- Standard: Sharp initial drop followed by gradual improvement |
|
- ELN: More controlled, stable descent |
|
- **Specialization**: |
|
- Standard: More variable specialization scores |
|
- ELN: Consistently high specialization maintenance |
|
|
|
#### Key Advantages of ELN |
|
1. More stable learning trajectory |
|
2. Better maintenance of model diversity |
|
3. Consistent specialization scores |
|
4. Reduced risk of catastrophic forgetting |
|
|
|
## Hardware & Framework Requirements |
|
|
|
### Hardware |
|
- Multi-GPU support via `DistributedDataParallel` |
|
- Memory optimization through gradient accumulation |
|
- Hardware monitoring (CPU/GPU usage) |
|
|
|
### Dependencies |
|
- transformers |
|
- peft |
|
- bitsandbytes |
|
- accelerate |
|
- wandb |
|
- torch >= 2.0 |
|
|
|
## Usage |
|
|
|
```python |
|
# Load model directly |
|
from transformers import AutoTokenizer, AutoModelForCausalLM |
|
|
|
tokenizer = AutoTokenizer.from_pretrained("diabolic6045/ELN-Llama-1B") |
|
model = AutoModelForCausalLM.from_pretrained("diabolic6045/ELN-Llama-1B") |
|
``` |
|
|
|
## Framework Versions |
|
- PEFT 0.14.0 |
|
|
|
## Future Work |
|
- Explore larger population sizes and generations |
|
- Implement additional mutation strategies |
|
- Test on diverse datasets and tasks |
|
- Investigate multi-objective optimization |
|
|
|
--- |
|
|
|
## Citation |
|
|
|
If you use this work, please cite: |
|
|
|
```bibtex |
|
@misc{eln2024, |
|
title={Evolution Learning Network (ELN): Combining QLoRA and Genetic Algorithms for LLM Optimization}, |
|
year={2024}, |
|
howpublished={\url{https://github.com/diabolic6045/ELN-llama-1B-adapter}} |
|
} |
|
``` |
|
|
|
### Related Works |
|
|
|
This project builds upon several key papers and techniques: |
|
|
|
```bibtex |
|
@article{dettmers2023qlora, |
|
title={QLoRA: Efficient Finetuning of Quantized LLMs}, |
|
author={Dettmers, Tim and Pagnoni, Artidoro and Holtzman, Ari and Zettlemoyer, Luke}, |
|
journal={arXiv preprint arXiv:2305.14314}, |
|
year={2023} |
|
} |
|
|
|
@article{touvron2023llama, |
|
title={Llama: Open and Efficient Foundation Language Models}, |
|
author={Touvron, Hugo and Lavril, Thibaut and Izacard, Gautier and Martinet, Xavier and Lachaux, Marie-Anne and Lacroix, Timoth{\'e}e and Rozi{\`e}re, Baptiste and Goyal, Naman and Hambro, Eric and Azhar, Faisal and Rodriguez, Aurelien and Joulin, Armand and Grave, Edouard and Lample, Guillaume}, |
|
journal={arXiv preprint arXiv:2302.13971}, |
|
year={2023} |
|
} |
|
|
|
@article{such2017deep, |
|
title={Deep Neuroevolution: Genetic Algorithms Are a Competitive Alternative for Training Deep Neural Networks for Reinforcement Learning}, |
|
author={Such, Felipe Petroski and Madhavan, Vashisht and Conti, Edoardo and Lehman, Joel and Stanley, Kenneth O and Clune, Jeff}, |
|
journal={arXiv preprint arXiv:1712.06567}, |
|
year={2017} |
|
} |
|
|
|
@article{real2019regularized, |
|
title={Regularized Evolution for Image Classifier Architecture Search}, |
|
author={Real, Esteban and Aggarwal, Alok and Huang, Yanping and Le, Quoc V}, |
|
journal={Proceedings of the AAAI Conference on Artificial Intelligence}, |
|
volume={33}, |
|
number={01}, |
|
pages={4780--4789}, |
|
year={2019} |
|
} |
|
``` |
|
|
|
These citations cover: |
|
1. QLoRA quantization and fine-tuning technique |
|
2. The base LLaMA model architecture |
|
3. Deep neuroevolution fundamentals |
|
4. Regularized evolution in neural networks |
|
|
|
The implementation also draws inspiration from recent advances in evolutionary algorithms and neural architecture search. |
|
|
|
~ [diabolic6045](https://huggingface.co/diabolic6045) |
|
|
|
--- |
|
|