English
align
clip

Model Details

This is an unofficial implementation of ALIGN trained on COYO-700M. The official ALIGN is trained on its dataset of 1.8B samples. That dataset is not released to the public. Instead, we trained our implementation of ALIGN model on COYO-700M.

It's developed by Kakao Brain to validate the performance of COYO-700M dataset on a large-scale model.

The training took about 8 days on TPU V3-512.

Model Date

April 2022

Model Type

This is dual encoder model where

  • image encoder is using EfficientNet-B7 architecture
  • text encoder is using BERT-base architecture

Training data

This model is trained on COYO-700M dataset.

Evaluation results

Dataset ImageNet Flickr30k MsCOCO
KNN I2T R@1 T2I R@1 I2T R@1 T2I R@1
ALIGN-L2-Large(Google) ALIGN 1.8B 76.4 88.6 75.7 58.6 45.6
ALIGN-B7-Base(Google) ALIGN 1.8B 69.3 - - 55.4 41.7
COYO-ALIGN-B7-Base(Kakao Brain) COYO-700M 68.6 88.1 73.2 61.2 43.1
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model authors have turned it off explicitly.

Dataset used to train kakaobrain/coyo-align-b7-base