|
--- |
|
language: |
|
- ru |
|
license: apache-2.0 |
|
--- |
|
|
|
# FRED-T5 1.7B (Full-scale Russian Enhanced Denoisers T5) |
|
|
|
Architecture based on T5. |
|
|
|
It has 24 layers and 1536 hidden size. |
|
|
|
Model trained on a mixture of 7 denoisers like UL2 with several differences (https://arxiv.org/abs/2205.05131). |
|
|
|
It trained on Russian language corpus (300GB). Dataset is the same as for ruT5 models. |
|
|
|
Bbpe tokenizer. |
|
|
|
First half of the time model trained on the small part of all datasets (1%,3GB) and without prefixes in each task. |
|
|
|
For RSG we trained as described in the T5 paper. First, we trained multitask for all tasks. Then we took the best checkpoint for the task and trained it further. |
|
|
|
Training loss: |
|
data:image/s3,"s3://crabby-images/f252d/f252deda7f465d710401ee4e938dfffaa149717e" alt="Screenshot 2023-01-21 at 11.36.52.png" |
|
|
|
We continue to experiment... |
|
|
|
We'll tell you more and release checkpoint to the public soon. |