princeton-nlp
/

Llama-3-8B-ProLong-64k-Base

Text Generation

text-generation-inference

Model card Files Files and versions Community

princeton-nlp commited on Jul 22, 2024

Commit

9223cce

·

verified ·

1 Parent(s): d47f954

Update README.md

Files changed (1) hide show

README.md +2 -0

README.md CHANGED Viewed

@@ -141,10 +141,12 @@ Note that we are still actively developing our evaluation and the results/tasks
 <details>
   <summary>Some more details about the evaluation.</summary>
 - All the evaluation context length is determined by the llama-2 tokenizer to accommodate models with smaller vocabularies.
 - For Json KV and RAG, we randomly sample positions of the target key-value pairs or the passages to test “lost-in-the-middle”.
 - For ICL, we use abstract labels (0,1,2,3…) instead of natural language labels ([Pan et al., 2023](https://arxiv.org/pdf/2305.09731)) to evaluate models’ ability to learn new tasks.
 - We use greedy decoding for all models/tasks.
 </details>

 <details>
   <summary>Some more details about the evaluation.</summary>
 - All the evaluation context length is determined by the llama-2 tokenizer to accommodate models with smaller vocabularies.
 - For Json KV and RAG, we randomly sample positions of the target key-value pairs or the passages to test “lost-in-the-middle”.
 - For ICL, we use abstract labels (0,1,2,3…) instead of natural language labels ([Pan et al., 2023](https://arxiv.org/pdf/2305.09731)) to evaluate models’ ability to learn new tasks.
 - We use greedy decoding for all models/tasks.
 </details>