Update README.md
Browse files
README.md
CHANGED
@@ -2,10 +2,10 @@
|
|
2 |
license: apache-2.0
|
3 |
---
|
4 |
|
5 |
-
# LimaRP-Llama2-7B-v3 (Alpaca, experimental,
|
6 |
|
7 |
This is an experimental version of LimaRP for Llama2 with an updated dataset (1800 training samples)
|
8 |
-
and a 2-pass training procedure. The first pass includes unsupervised finetuning on
|
9 |
4k tokens length and the second pass is LimaRP with changes introducing more effective control on response length.
|
10 |
|
11 |
For more details about LimaRP, see the model page for the [previously released version](https://huggingface.co/lemonilia/limarp-llama2-v2).
|
@@ -81,7 +81,7 @@ your desired response length:
|
|
81 |
|
82 |
## Training procedure
|
83 |
[Axolotl](https://github.com/OpenAccess-AI-Collective/axolotl) was used for training
|
84 |
-
on a
|
85 |
it's so large because a LoRA rank of 256 was also used. The reasoning was that this
|
86 |
might have helped the model internalize any newly acquired information, making the
|
87 |
training process closer to a full finetune.
|
@@ -92,17 +92,17 @@ models).
|
|
92 |
### Training hyperparameters
|
93 |
For the first pass these settings were used:
|
94 |
|
95 |
-
- learning_rate: 0.
|
96 |
- lr_scheduler_type: constant
|
97 |
- lora_r: 256
|
98 |
- lora_alpha: 16
|
99 |
-
- lora_dropout: 0.
|
100 |
- lora_target_linear: True
|
101 |
- num_epochs: 1
|
102 |
- bf16: True
|
103 |
- tf32: True
|
104 |
-
-
|
105 |
-
- adapter:
|
106 |
- micro_batch_size: 2
|
107 |
- gradient_accumulation_steps: 1
|
108 |
- optimizer: adamw_torch
|
@@ -111,6 +111,5 @@ In the second pass, the `lora_model_dir` option was used to load and train the a
|
|
111 |
previously trained on a stories dataset. These settings were also changed:
|
112 |
|
113 |
- lora_dropout: 0.0
|
114 |
-
|
115 |
-
|
116 |
-
- learning_rate: 0.0006
|
|
|
2 |
license: apache-2.0
|
3 |
---
|
4 |
|
5 |
+
# LimaRP-Llama2-7B-v3 (Alpaca, experimental, 8-bit LoRA adapter)
|
6 |
|
7 |
This is an experimental version of LimaRP for Llama2 with an updated dataset (1800 training samples)
|
8 |
+
and a 2-pass training procedure. The first pass includes unsupervised finetuning on about 6800 stories within
|
9 |
4k tokens length and the second pass is LimaRP with changes introducing more effective control on response length.
|
10 |
|
11 |
For more details about LimaRP, see the model page for the [previously released version](https://huggingface.co/lemonilia/limarp-llama2-v2).
|
|
|
81 |
|
82 |
## Training procedure
|
83 |
[Axolotl](https://github.com/OpenAccess-AI-Collective/axolotl) was used for training
|
84 |
+
on a 4x NVidia A40 GPU. The model has been trained as an 8-bit LoRA adapter, and
|
85 |
it's so large because a LoRA rank of 256 was also used. The reasoning was that this
|
86 |
might have helped the model internalize any newly acquired information, making the
|
87 |
training process closer to a full finetune.
|
|
|
92 |
### Training hyperparameters
|
93 |
For the first pass these settings were used:
|
94 |
|
95 |
+
- learning_rate: 0.00065
|
96 |
- lr_scheduler_type: constant
|
97 |
- lora_r: 256
|
98 |
- lora_alpha: 16
|
99 |
+
- lora_dropout: 0.05
|
100 |
- lora_target_linear: True
|
101 |
- num_epochs: 1
|
102 |
- bf16: True
|
103 |
- tf32: True
|
104 |
+
- load_in_8bit: True
|
105 |
+
- adapter: lora
|
106 |
- micro_batch_size: 2
|
107 |
- gradient_accumulation_steps: 1
|
108 |
- optimizer: adamw_torch
|
|
|
111 |
previously trained on a stories dataset. These settings were also changed:
|
112 |
|
113 |
- lora_dropout: 0.0
|
114 |
+
|
115 |
+
Using 4 GPUs, the effective global batch size would have been 8.
|
|