juliekallini commited on
Commit
a8660d7
·
verified ·
1 Parent(s): e598159

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -33,7 +33,7 @@ datasets:
33
 
34
  <!-- Provide a quick summary of what the model is/does. -->
35
 
36
- **MrT5** (**M**e**r**ge**T5**) is a more efficient variant of ByT5 that integrates a token deletion mechanism in its encoder to *dynamically* shorten the input sequence length. After processing through a fixed number of encoder layers, a learned *delete gate* determines which tokens are to be removed and which are to be retained for subsequent layers. By effectively "merging" critical information from deleted tokens into a more compact sequence, MrT5 presents a solution to the practical limitations of existing byte-level models.
37
 
38
  ## Citation
39
 
 
33
 
34
  <!-- Provide a quick summary of what the model is/does. -->
35
 
36
+ **MrT5** (**M**e**r**ge**T5**) is a more efficient variant of [ByT5 (Xue et al., 2022)](https://arxiv.org/abs/2105.13626) that integrates a token deletion mechanism in its encoder to *dynamically* shorten the input sequence length. After processing through a fixed number of encoder layers, a learned *delete gate* determines which tokens are to be removed and which are to be retained for subsequent layers. By effectively "merging" critical information from deleted tokens into a more compact sequence, MrT5 presents a solution to the practical limitations of existing byte-level models.
37
 
38
  ## Citation
39