Update README.md
Browse files
README.md
CHANGED
@@ -2,18 +2,15 @@
|
|
2 |
datasets:
|
3 |
- jondurbin/airoboros-gpt4-1.4.1
|
4 |
---
|
5 |
-
RoPE Scaled QLoRA Finetune of airoboros-33b-gpt4-1.4.1 (LoRA)
|
6 |
|
7 |
Full model card and GPTQ 4bit quantized weights can be found here: https://huggingface.co/bhenrym14/airoboros-33b-gpt4-1.4.1-PI-8192-GPTQ
|
8 |
|
|
|
9 |
|
10 |
-
|
11 |
-
|
12 |
-
|
13 |
-
|
14 |
-
|
15 |
-
Context length extended to 8192 by RoPE Scaled Embeddings, but NOT via the superHOT LoRA. I started with base Llama-33b.
|
16 |
-
Training sequences beyond 2048 have the target truncated to equal 2048.
|
17 |
-
Used airoboros-gpt4-1.4.1 dataset instead of airoboros-gpt4-1.4
|
18 |
|
19 |
Otherwise, I emulated the training process as closely as possible (rank 64 QLoRA) It was trained on 1x RTX 6000 Ada for ~43 hours.
|
|
|
2 |
datasets:
|
3 |
- jondurbin/airoboros-gpt4-1.4.1
|
4 |
---
|
5 |
+
# RoPE Scaled QLoRA Finetune of airoboros-33b-gpt4-1.4.1 (LoRA)
|
6 |
|
7 |
Full model card and GPTQ 4bit quantized weights can be found here: https://huggingface.co/bhenrym14/airoboros-33b-gpt4-1.4.1-PI-8192-GPTQ
|
8 |
|
9 |
+
## Overview
|
10 |
|
11 |
+
This is [Jon Durbin's Airoboros 33B GPT4 1.4](https://huggingface.co/jondurbin/airoboros-33b-gpt4-1.4) (LoRA weights) with several key modifications:
|
12 |
+
- Context length extended to 8192 by RoPE Scaled Embeddings, but NOT via the superHOT LoRA. I started with base Llama-33b.
|
13 |
+
- Training sequences beyond 2048 have the target truncated to equal 2048.
|
14 |
+
- Used airoboros-gpt4-1.4.1 dataset instead of airoboros-gpt4-1.4
|
|
|
|
|
|
|
|
|
15 |
|
16 |
Otherwise, I emulated the training process as closely as possible (rank 64 QLoRA) It was trained on 1x RTX 6000 Ada for ~43 hours.
|