flax-community
/

clip-vision-bert-cc12m-60k

clip-vision-bert

Model card Files Files and versions Community

gchhablani commited on Jul 21, 2021

Commit

0e74c3e

·

1 Parent(s): bf302be

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -77,7 +77,7 @@ The visual embeddings are taken from the CLIP-Vision model and combined with the
 A total length of 128 tokens, including the visual embeddings, is used. The texts are truncated or padded accordingly.
 ### Pretraining
-The checkpoint of the model was trained on Google Cloud Engine TPUv3-8 machine (with 335 GB of RAM, 1000 GB of hard drive, 96 CPU cores) **8 v3 TPU cores** for 60k steps with a per device batch size of 64 and a max sequence length of 128. The optimizer used is Adafactor with a learning rate of 1e-4, learning rate warmup for 1,000 steps, and linear decay of the learning rate after.
 We tracked experiments using TensorBoard. Here is the link to the main dashboard: [CLIP Vision BERT CC12M Pre-training Dashboard](https://huggingface.co/flax-community/multilingual-vqa-pt-ckpts/tensorboard)

 A total length of 128 tokens, including the visual embeddings, is used. The texts are truncated or padded accordingly.
 ### Pretraining
+The checkpoint of the model was trained on Google Cloud Engine TPUv3-8 machine (with 335 GB of RAM, 1000 GB of hard drive, 96 CPU cores) **8 v3 TPU cores** for 60k steps with a per device batch size of 64 and a max sequence length of 128. The optimizer used is Adafactor with a learning rate of 1e-4, learning rate warmup for 5,000 steps, and linear decay of the learning rate after.
 We tracked experiments using TensorBoard. Here is the link to the main dashboard: [CLIP Vision BERT CC12M Pre-training Dashboard](https://huggingface.co/flax-community/multilingual-vqa-pt-ckpts/tensorboard)