Update README.md
Browse files
README.md
CHANGED
@@ -141,10 +141,12 @@ Note that we are still actively developing our evaluation and the results/tasks
|
|
141 |
|
142 |
<details>
|
143 |
<summary>Some more details about the evaluation.</summary>
|
|
|
144 |
- All the evaluation context length is determined by the llama-2 tokenizer to accommodate models with smaller vocabularies.
|
145 |
- For Json KV and RAG, we randomly sample positions of the target key-value pairs or the passages to test “lost-in-the-middle”.
|
146 |
- For ICL, we use abstract labels (0,1,2,3…) instead of natural language labels ([Pan et al., 2023](https://arxiv.org/pdf/2305.09731)) to evaluate models’ ability to learn new tasks.
|
147 |
- We use greedy decoding for all models/tasks.
|
|
|
148 |
</details>
|
149 |
|
150 |
|
|
|
141 |
|
142 |
<details>
|
143 |
<summary>Some more details about the evaluation.</summary>
|
144 |
+
|
145 |
- All the evaluation context length is determined by the llama-2 tokenizer to accommodate models with smaller vocabularies.
|
146 |
- For Json KV and RAG, we randomly sample positions of the target key-value pairs or the passages to test “lost-in-the-middle”.
|
147 |
- For ICL, we use abstract labels (0,1,2,3…) instead of natural language labels ([Pan et al., 2023](https://arxiv.org/pdf/2305.09731)) to evaluate models’ ability to learn new tasks.
|
148 |
- We use greedy decoding for all models/tasks.
|
149 |
+
|
150 |
</details>
|
151 |
|
152 |
|