namespace-Pt commited on
Commit
0851965
·
verified ·
1 Parent(s): 612696d

Upload folder using huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +7 -7
README.md CHANGED
@@ -25,11 +25,11 @@ We evaluate the model on the Needle-In-A-HayStack task using the official settin
25
  ## LongBench
26
  We evaluate the model on [LongBench](https://arxiv.org/abs/2308.14508) using 32K context length and the official prompt template. For [meta-llama/Meta-Llama-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct), we use 8K context length.
27
 
28
- |Model|Single-Doc QA|Multi-Doc QA|Summarization|Few-Shot Learning|
29
- |:-:|:-:|:-:|:-:|:-:|
30
- |[meta-llama/Meta-Llama-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct)|37.33|36.04|26.83|69.56|
31
- |[gradientai/Llama-3-8B-Instruct-262k](https://huggingface.co/NousResearch/Yarn-Mistral-7b-128k)|37.29|31.20|26.18|67.25|
32
- |[Llama-3-8B-Instruct-80K-QLoRA]()|43.57|43.07|28.93|69.15|
33
 
34
  ## InfiniteBench
35
  We evaluate the model on [InfiniteBench](https://arxiv.org/pdf/2402.13718.pdf) using 80K context length and the official prompt template. The results of GPT4 is copied from the [paper](https://arxiv.org/pdf/2402.13718.pdf). For [meta-llama/Meta-Llama-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct), we use 8K context length.
@@ -39,7 +39,7 @@ We evaluate the model on [InfiniteBench](https://arxiv.org/pdf/2402.13718.pdf) u
39
  |GPT4|22.22|
40
  |[meta-llama/Meta-Llama-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct)|7.00|
41
  |[gradientai/Llama-3-8B-Instruct-262k](https://huggingface.co/NousResearch/Yarn-Mistral-7b-128k)|20.30|
42
- |[Llama-3-8B-Instruct-80K-QLoRA]()|30.92|
43
 
44
  ## Topic Retrieval
45
  We evaluate the model on [Topic Retrieval](https://lmsys.org/blog/2023-06-29-longchat/) task with `[5,10,15,20,25,30,40,50,60,70]` topics.
@@ -52,7 +52,7 @@ We evaluate the model's zero-shot performance on MMLU benchmark as a reflection
52
 
53
  |Model|STEM|Social Sciences|Humanities|Others|Avg|
54
  |:-:|:-:|:-:|:-:|:-:|:-:|
55
- |[meta-llama/Meta-Llama-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct)|53.87|75.66|69.44|69.75|65.91|
56
  |[gradientai/Llama-3-8B-Instruct-262k](https://huggingface.co/NousResearch/Yarn-Mistral-7b-128k)|52.10|73.26|67.15|69.80|64.34|
57
  |[Llama-3-8B-Instruct-80K-QLoRA]()|53.10|73.24|67.32|68.79|64.44|
58
 
 
25
  ## LongBench
26
  We evaluate the model on [LongBench](https://arxiv.org/abs/2308.14508) using 32K context length and the official prompt template. For [meta-llama/Meta-Llama-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct), we use 8K context length.
27
 
28
+ |Model|Single-Doc QA|Multi-Doc QA|Summarization|Few-Shot Learning|Synthetic|Code|
29
+ |:-:|:-:|:-:|:-:|:-:|:-:|:-:|
30
+ |[meta-llama/Meta-Llama-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct)|37.33|36.04|26.83|69.56|37.75|53.24|
31
+ |[gradientai/Llama-3-8B-Instruct-262k](https://huggingface.co/NousResearch/Yarn-Mistral-7b-128k)|37.29|31.20|26.18|67.25|44.25|**62.71**|
32
+ |[Llama-3-8B-Instruct-80K-QLoRA]()|**43.57**|**43.07**|**28.93**|**69.15**|**48.50**|51.95|
33
 
34
  ## InfiniteBench
35
  We evaluate the model on [InfiniteBench](https://arxiv.org/pdf/2402.13718.pdf) using 80K context length and the official prompt template. The results of GPT4 is copied from the [paper](https://arxiv.org/pdf/2402.13718.pdf). For [meta-llama/Meta-Llama-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct), we use 8K context length.
 
39
  |GPT4|22.22|
40
  |[meta-llama/Meta-Llama-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct)|7.00|
41
  |[gradientai/Llama-3-8B-Instruct-262k](https://huggingface.co/NousResearch/Yarn-Mistral-7b-128k)|20.30|
42
+ |[Llama-3-8B-Instruct-80K-QLoRA]()|**30.92**|
43
 
44
  ## Topic Retrieval
45
  We evaluate the model on [Topic Retrieval](https://lmsys.org/blog/2023-06-29-longchat/) task with `[5,10,15,20,25,30,40,50,60,70]` topics.
 
52
 
53
  |Model|STEM|Social Sciences|Humanities|Others|Avg|
54
  |:-:|:-:|:-:|:-:|:-:|:-:|
55
+ |[meta-llama/Meta-Llama-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct)|**53.87**|**75.66**|**69.44**|**69.75**|**65.91**|
56
  |[gradientai/Llama-3-8B-Instruct-262k](https://huggingface.co/NousResearch/Yarn-Mistral-7b-128k)|52.10|73.26|67.15|69.80|64.34|
57
  |[Llama-3-8B-Instruct-80K-QLoRA]()|53.10|73.24|67.32|68.79|64.44|
58