Update README.md
Browse files
README.md
CHANGED
@@ -33,7 +33,7 @@ datasets:
|
|
33 |
|
34 |
<!-- Provide a quick summary of what the model is/does. -->
|
35 |
|
36 |
-
**MrT5** (**M**e**r**ge**T5**) is a more efficient variant of ByT5 that integrates a token deletion mechanism in its encoder to *dynamically* shorten the input sequence length. After processing through a fixed number of encoder layers, a learned *delete gate* determines which tokens are to be removed and which are to be retained for subsequent layers. By effectively "merging" critical information from deleted tokens into a more compact sequence, MrT5 presents a solution to the practical limitations of existing byte-level models.
|
37 |
|
38 |
## Citation
|
39 |
|
|
|
33 |
|
34 |
<!-- Provide a quick summary of what the model is/does. -->
|
35 |
|
36 |
+
**MrT5** (**M**e**r**ge**T5**) is a more efficient variant of [ByT5 (Xue et al., 2022)](https://arxiv.org/abs/2105.13626) that integrates a token deletion mechanism in its encoder to *dynamically* shorten the input sequence length. After processing through a fixed number of encoder layers, a learned *delete gate* determines which tokens are to be removed and which are to be retained for subsequent layers. By effectively "merging" critical information from deleted tokens into a more compact sequence, MrT5 presents a solution to the practical limitations of existing byte-level models.
|
37 |
|
38 |
## Citation
|
39 |
|