nanoT5-mid-65kBPE-2048

This is a "raw" pretrained model intended to be fine-tuned on downstream tasks

A "mid" size T5 model pretrained on c4:

  • trained @ context length 2048
  • 16 layers, hidden size 1024, FF 3072. SiLU activations
  • pretrained on allenai/c4 (en subset) for 65k steps
  • uses an adapted claude3 tokenizer; vocab size 65k

More details and logs under checkpoints/

Downloads last month
12
Safetensors
Model size
637M params
Tensor type
F32
·
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.

Model tree for pszemraj/nanoT5-mid-65kBPE-2048

Finetunes
1 model

Dataset used to train pszemraj/nanoT5-mid-65kBPE-2048