princeton-nlp commited on
Commit
9223cce
·
verified ·
1 Parent(s): d47f954

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -0
README.md CHANGED
@@ -141,10 +141,12 @@ Note that we are still actively developing our evaluation and the results/tasks
141
 
142
  <details>
143
  <summary>Some more details about the evaluation.</summary>
 
144
  - All the evaluation context length is determined by the llama-2 tokenizer to accommodate models with smaller vocabularies.
145
  - For Json KV and RAG, we randomly sample positions of the target key-value pairs or the passages to test “lost-in-the-middle”.
146
  - For ICL, we use abstract labels (0,1,2,3…) instead of natural language labels ([Pan et al., 2023](https://arxiv.org/pdf/2305.09731)) to evaluate models’ ability to learn new tasks.
147
  - We use greedy decoding for all models/tasks.
 
148
  </details>
149
 
150
 
 
141
 
142
  <details>
143
  <summary>Some more details about the evaluation.</summary>
144
+
145
  - All the evaluation context length is determined by the llama-2 tokenizer to accommodate models with smaller vocabularies.
146
  - For Json KV and RAG, we randomly sample positions of the target key-value pairs or the passages to test “lost-in-the-middle”.
147
  - For ICL, we use abstract labels (0,1,2,3…) instead of natural language labels ([Pan et al., 2023](https://arxiv.org/pdf/2305.09731)) to evaluate models’ ability to learn new tasks.
148
  - We use greedy decoding for all models/tasks.
149
+
150
  </details>
151
 
152