pszemraj commited on
Commit
9503ceb
·
verified ·
1 Parent(s): b99a8a2

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +10 -2
README.md CHANGED
@@ -2,13 +2,21 @@
2
  license: apache-2.0
3
  datasets:
4
  - allenai/c4
 
 
5
  ---
6
 
7
  # nanoT5-mid-65kBPE-2048
8
 
 
 
 
 
9
  A "mid" size T5 model pretrained on c4:
10
 
11
  - trained @ context length 2048
12
  - 16 layers, hidden size 1024, FF 3072. SiLU activations
13
- - pretrained on `allenai/c4` for 65k steps
14
- - uses an [adapted claude3 tokenizer](https://huggingface.co/BEE-spoke-data/claude-tokenizer-forT5); vocab size 65k
 
 
 
2
  license: apache-2.0
3
  datasets:
4
  - allenai/c4
5
+ language:
6
+ - en
7
  ---
8
 
9
  # nanoT5-mid-65kBPE-2048
10
 
11
+ > [!NOTE]
12
+ > This is a "raw" pretrained model intended to be fine-tuned on downstream tasks
13
+
14
+
15
  A "mid" size T5 model pretrained on c4:
16
 
17
  - trained @ context length 2048
18
  - 16 layers, hidden size 1024, FF 3072. SiLU activations
19
+ - pretrained on `allenai/c4` (`en` subset) for 65k steps
20
+ - uses an [adapted claude3 tokenizer](https://huggingface.co/BEE-spoke-data/claude-tokenizer-forT5); vocab size 65k
21
+
22
+ More details and logs under [checkpoints/](https://huggingface.co/pszemraj/nanoT5-mid-65kBPE-2048/tree/main/checkpoints)