Update README.md
Browse files
README.md
CHANGED
@@ -1,11 +1,8 @@
|
|
1 |
---
|
2 |
datasets:
|
3 |
- jondurbin/airoboros-2.1
|
4 |
-
- kmfoda/booksum
|
5 |
---
|
6 |
|
7 |
-
|
8 |
-
|
9 |
# Extended Context (via YaRN) Llama-2-13b with airoboros-2.1 (fp16)
|
10 |
|
11 |
|
@@ -30,7 +27,7 @@ Ooba use: Be sure to increase the `Truncate the prompt up to this length` parame
|
|
30 |
|
31 |
## Motivation
|
32 |
|
33 |
-
|
34 |
|
35 |
## Relative Performance (wikitext perplexity)
|
36 |
|
@@ -43,9 +40,6 @@ Y
|
|
43 |
| 8192 | **4.71** | **4.71** | 4.90 | 5.32 | Not Tested | 57.1 |
|
44 |
| 12000 | **4.54** | 55 | 4.82 | 56.1 | Not Tested | Not Tested |
|
45 |
|
46 |
-
- Larger PI scaling factors increase short context performance degradation. If you don't require 16k context, you're better off using a model with a different context extension method, or a smaller (or no) PI scaling factor. Given this, don't expect anything special from this model on the HF leaderboard. Whether or not this is relevant to you will depend on your intended use case.
|
47 |
-
- Beyond 8k, this model has lower perplexity than all other models tested here.
|
48 |
-
- I'm actively exploring/implementing other context extension methods that may ameliorate the tendency of PI methods to impair the ability of the model to attend to the context space equally.
|
49 |
|
50 |
## Prompting:
|
51 |
|
|
|
1 |
---
|
2 |
datasets:
|
3 |
- jondurbin/airoboros-2.1
|
|
|
4 |
---
|
5 |
|
|
|
|
|
6 |
# Extended Context (via YaRN) Llama-2-13b with airoboros-2.1 (fp16)
|
7 |
|
8 |
|
|
|
27 |
|
28 |
## Motivation
|
29 |
|
30 |
+
[Yet another RoPE extensioN method (YARN)](https://github.com/jquesnelle/yarn/blob/master/paper/yarn.pdf) is a novel method of extending the useful context of pretrained LLMs, with architectures employing RoPE, with minimal additonal training requirements. This method is the consequence of efforts to mitigate the shortcomings of other methods such as Position Interpolation (PI) and NTK-Aware scaling. Since I am unaware of any existing instruction-tuned models which employ YaRN, I finetuned using Jon Durbin's latest airoboros dataset.
|
31 |
|
32 |
## Relative Performance (wikitext perplexity)
|
33 |
|
|
|
40 |
| 8192 | **4.71** | **4.71** | 4.90 | 5.32 | Not Tested | 57.1 |
|
41 |
| 12000 | **4.54** | 55 | 4.82 | 56.1 | Not Tested | Not Tested |
|
42 |
|
|
|
|
|
|
|
43 |
|
44 |
## Prompting:
|
45 |
|