Update README.md
Browse files
README.md
CHANGED
@@ -27,18 +27,18 @@ Ooba use: Be sure to increase the `Truncate the prompt up to this length` parame
|
|
27 |
|
28 |
## Motivation
|
29 |
|
30 |
-
[Yet another RoPE extensioN method (YARN)](https://github.com/jquesnelle/yarn/blob/master/paper/yarn.pdf) is a novel method of extending the useful context of pretrained LLMs, with architectures employing RoPE, with minimal additonal training requirements. This method is the consequence of efforts to mitigate the shortcomings of other methods such as Position Interpolation (PI) and NTK-Aware scaling.
|
31 |
|
32 |
## Relative Performance (wikitext perplexity)
|
33 |
|
34 |
-
| Context (tokens)
|
35 |
-
| --- | --- | ---| ----- | -----| ------| --- |
|
36 |
-
| 512 | 7.67 | 7.38 | 7.62 | 8.24 | 7.90 | **7.23** |
|
37 |
-
| 1024 | 6.15 | 5.99 | 6.20 | 6.71 | 6.17 | **5.85** |
|
38 |
-
| 2048 | 5.29 | 5.22 | 5.38 | 5.87 | 5.23 | **5.07** |
|
39 |
-
| 4096 | 4.94 | 4.90 | 5.08 | 5.50 | 4.91 | **4.77** |
|
40 |
-
| 8192 |
|
41 |
-
| 12000 |
|
42 |
|
43 |
|
44 |
## Prompting:
|
|
|
27 |
|
28 |
## Motivation
|
29 |
|
30 |
+
[Yet another RoPE extensioN method (YARN)](https://github.com/jquesnelle/yarn/blob/master/paper/yarn.pdf) is a novel method of extending the useful context of pretrained LLMs, with architectures employing RoPE, with minimal additonal training requirements. This method is the consequence of efforts to mitigate the shortcomings of other methods such as Position Interpolation (PI) and NTK-Aware scaling. This model is an attempt to enable the community to assess the capabilities of this extension method in real world applications.
|
31 |
|
32 |
## Relative Performance (wikitext perplexity)
|
33 |
|
34 |
+
| Context (tokens) | **bhenrym14/airoboros-l2-13b-2.1-YaRN-64k** | bhenrym14/airoboros-l2-13b-PI-16k-fp16 | bhenrym14/airophin-v2-13b-PI-8k-fp16 | bhenrym14/airophin-13b-pntk-16k-fp16| bhenrym14/airoboros-13b-gpt4-1.4.1-PI-8192-fp16 |bhenrym14/airoboros-33b-gpt4-1.4.1-lxctx-PI-16384-fp16 | jondurbin/airoboros-l2-13b-gpt4-1.4.1 |
|
35 |
+
| --- | --- |--- | ---| ----- | -----| ------| --- |
|
36 |
+
| 512 | | 7.67 | 7.38 | 7.62 | 8.24 | 7.90 | **7.23** |
|
37 |
+
| 1024 | | 6.15 | 5.99 | 6.20 | 6.71 | 6.17 | **5.85** |
|
38 |
+
| 2048 | | 5.29 | 5.22 | 5.38 | 5.87 | 5.23 | **5.07** |
|
39 |
+
| 4096 | |4.94 | 4.90 | 5.08 | 5.50 | 4.91 | **4.77** |
|
40 |
+
| 8192 | |**4.71** | **4.71** | 4.90 | 5.32 | Not Tested | 57.1 |
|
41 |
+
| 12000 | |**4.54** | 55 | 4.82 | 56.1 | Not Tested | Not Tested |
|
42 |
|
43 |
|
44 |
## Prompting:
|