Update README.md
Browse files
README.md
CHANGED
@@ -2,13 +2,21 @@
|
|
2 |
license: apache-2.0
|
3 |
datasets:
|
4 |
- allenai/c4
|
|
|
|
|
5 |
---
|
6 |
|
7 |
# nanoT5-mid-65kBPE-2048
|
8 |
|
|
|
|
|
|
|
|
|
9 |
A "mid" size T5 model pretrained on c4:
|
10 |
|
11 |
- trained @ context length 2048
|
12 |
- 16 layers, hidden size 1024, FF 3072. SiLU activations
|
13 |
-
- pretrained on `allenai/c4` for 65k steps
|
14 |
-
- uses an [adapted claude3 tokenizer](https://huggingface.co/BEE-spoke-data/claude-tokenizer-forT5); vocab size 65k
|
|
|
|
|
|
2 |
license: apache-2.0
|
3 |
datasets:
|
4 |
- allenai/c4
|
5 |
+
language:
|
6 |
+
- en
|
7 |
---
|
8 |
|
9 |
# nanoT5-mid-65kBPE-2048
|
10 |
|
11 |
+
> [!NOTE]
|
12 |
+
> This is a "raw" pretrained model intended to be fine-tuned on downstream tasks
|
13 |
+
|
14 |
+
|
15 |
A "mid" size T5 model pretrained on c4:
|
16 |
|
17 |
- trained @ context length 2048
|
18 |
- 16 layers, hidden size 1024, FF 3072. SiLU activations
|
19 |
+
- pretrained on `allenai/c4` (`en` subset) for 65k steps
|
20 |
+
- uses an [adapted claude3 tokenizer](https://huggingface.co/BEE-spoke-data/claude-tokenizer-forT5); vocab size 65k
|
21 |
+
|
22 |
+
More details and logs under [checkpoints/](https://huggingface.co/pszemraj/nanoT5-mid-65kBPE-2048/tree/main/checkpoints)
|