Commit
·
0e74c3e
1
Parent(s):
bf302be
Update README.md
Browse files
README.md
CHANGED
@@ -77,7 +77,7 @@ The visual embeddings are taken from the CLIP-Vision model and combined with the
|
|
77 |
A total length of 128 tokens, including the visual embeddings, is used. The texts are truncated or padded accordingly.
|
78 |
|
79 |
### Pretraining
|
80 |
-
The checkpoint of the model was trained on Google Cloud Engine TPUv3-8 machine (with 335 GB of RAM, 1000 GB of hard drive, 96 CPU cores) **8 v3 TPU cores** for 60k steps with a per device batch size of 64 and a max sequence length of 128. The optimizer used is Adafactor with a learning rate of 1e-4, learning rate warmup for
|
81 |
|
82 |
We tracked experiments using TensorBoard. Here is the link to the main dashboard: [CLIP Vision BERT CC12M Pre-training Dashboard](https://huggingface.co/flax-community/multilingual-vqa-pt-ckpts/tensorboard)
|
83 |
|
|
|
77 |
A total length of 128 tokens, including the visual embeddings, is used. The texts are truncated or padded accordingly.
|
78 |
|
79 |
### Pretraining
|
80 |
+
The checkpoint of the model was trained on Google Cloud Engine TPUv3-8 machine (with 335 GB of RAM, 1000 GB of hard drive, 96 CPU cores) **8 v3 TPU cores** for 60k steps with a per device batch size of 64 and a max sequence length of 128. The optimizer used is Adafactor with a learning rate of 1e-4, learning rate warmup for 5,000 steps, and linear decay of the learning rate after.
|
81 |
|
82 |
We tracked experiments using TensorBoard. Here is the link to the main dashboard: [CLIP Vision BERT CC12M Pre-training Dashboard](https://huggingface.co/flax-community/multilingual-vqa-pt-ckpts/tensorboard)
|
83 |
|