Upload folder using huggingface_hub
Browse files
README.md
CHANGED
@@ -6,7 +6,7 @@ pipeline_tag: text-generation
|
|
6 |
<div align="center">
|
7 |
<h1>Llama-3-8B-Instruct-80K-QLoRA</h1>
|
8 |
|
9 |
-
<a href="https://github.com/FlagOpen/FlagEmbedding/tree/master/Long_LLM/">[Data&Code]</a>
|
10 |
</div>
|
11 |
|
12 |
We extend the context length of Llama-3-8B-Instruct to 80K using QLoRA and 3.5K long-context training data synthesized from GPT-4. The entire training cycle is super efficient, which takes 8 hours on a 8xA800 (80G) machine. Yet, the resulted model achieves remarkable performance on a series of downstream long-context evaluation benchmarks.
|
@@ -14,7 +14,7 @@ We extend the context length of Llama-3-8B-Instruct to 80K using QLoRA and 3.5K
|
|
14 |
|
15 |
# Evaluation
|
16 |
|
17 |
-
All the following evaluation results can be reproduced following instructions [here](https://github.com/FlagOpen/FlagEmbedding/tree/master/Long_LLM/
|
18 |
|
19 |
## Needle in a Haystack
|
20 |
We evaluate the model on the Needle-In-A-HayStack task using the official setting. The blue vertical line indicates the training context length, i.e. 80K.
|
|
|
6 |
<div align="center">
|
7 |
<h1>Llama-3-8B-Instruct-80K-QLoRA</h1>
|
8 |
|
9 |
+
<a href="https://github.com/FlagOpen/FlagEmbedding/tree/master/Long_LLM/longllm_qlora">[Data&Code]</a>
|
10 |
</div>
|
11 |
|
12 |
We extend the context length of Llama-3-8B-Instruct to 80K using QLoRA and 3.5K long-context training data synthesized from GPT-4. The entire training cycle is super efficient, which takes 8 hours on a 8xA800 (80G) machine. Yet, the resulted model achieves remarkable performance on a series of downstream long-context evaluation benchmarks.
|
|
|
14 |
|
15 |
# Evaluation
|
16 |
|
17 |
+
All the following evaluation results can be reproduced following instructions [here](https://github.com/FlagOpen/FlagEmbedding/tree/master/Long_LLM/longllm_qlora).
|
18 |
|
19 |
## Needle in a Haystack
|
20 |
We evaluate the model on the Needle-In-A-HayStack task using the official setting. The blue vertical line indicates the training context length, i.e. 80K.
|