File size: 1,724 Bytes
88c922f
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
# OpenCLIP

This is a fork of <a href="https://github.com/mlfoundations/open_clip">OpenCLIP</a> used to fine-tune CLIP models with PinPoint counterfactuals. Refer to the original repository for more details on open_clip.


### Installation

```
pip install open_clip_torch
```


### Pretrained models

For LAION-pretrained models, download and  place them in the ./pretrained_models (this can be done with open_clip CLI interface)/

### Sample single-process running code:

To finetune CLIP models on CC3M:

```bash
python -m open_clip_train.main \
    --save-frequency 1 \
    --zeroshot-frequency 1 \
    --report-to tensorboard \
    --train-data="..path_to_image_list.csv" \
    --csv-img-key="Image_ID" \
    --csv-caption-key="Caption" \
    --val-data="/path/to/validation_data.csv"  \
    --imagenet-val="/path/to/imagenet/root/val/" \
    --warmup 10000 \
    --batch-size=128 \
    --accum_freq=10 \
    --lr=5e-06 \
    --wd=0.1 \
    --epochs=410 \
    --workers=8 \
    --pretrained_model="pretrained_models/vit_b16_laion2b.pth" \
    --model ViT-B-16
```

Note: `imagenet-val` is the path to the *validation* set of ImageNet for zero-shot evaluation, not the training set!
You can remove this argument if you do not want to perform zero-shot evaluation on ImageNet throughout training. Note that the `val` folder should contain subfolders. If it does not, please use [this script](https://raw.githubusercontent.com/soumith/imagenetloader.torch/master/valprep.sh).

Note: the `train_data` should point to a *.csv file that contains the filelist with generated images in the following format:
`ÌMAGE_ID   IMAGE_CAPTION`, separated by '\t'. You can find the lists for our in-painted data under `./annotations`.