nanoT5-mid-65kBPE-2048
This is a "raw" pretrained model intended to be fine-tuned on downstream tasks
A "mid" size T5 model pretrained on c4:
- trained @ context length 2048
- 16 layers, hidden size 1024, FF 3072. SiLU activations
- pretrained on
allenai/c4
(en
subset) for 65k steps - uses an adapted claude3 tokenizer; vocab size 65k
More details and logs under checkpoints/
- Downloads last month
- 12
Inference Providers
NEW
This model is not currently available via any of the supported Inference Providers.