PinPoint / Finetuning /README.md
anonymous-upload-neurips-2025's picture
Upload 221 files
88c922f verified

OpenCLIP

This is a fork of OpenCLIP used to fine-tune CLIP models with PinPoint counterfactuals. Refer to the original repository for more details on open_clip.

Installation

pip install open_clip_torch

Pretrained models

For LAION-pretrained models, download and place them in the ./pretrained_models (this can be done with open_clip CLI interface)/

Sample single-process running code:

To finetune CLIP models on CC3M:

python -m open_clip_train.main \
    --save-frequency 1 \
    --zeroshot-frequency 1 \
    --report-to tensorboard \
    --train-data="..path_to_image_list.csv" \
    --csv-img-key="Image_ID" \
    --csv-caption-key="Caption" \
    --val-data="/path/to/validation_data.csv"  \
    --imagenet-val="/path/to/imagenet/root/val/" \
    --warmup 10000 \
    --batch-size=128 \
    --accum_freq=10 \
    --lr=5e-06 \
    --wd=0.1 \
    --epochs=410 \
    --workers=8 \
    --pretrained_model="pretrained_models/vit_b16_laion2b.pth" \
    --model ViT-B-16

Note: imagenet-val is the path to the validation set of ImageNet for zero-shot evaluation, not the training set! You can remove this argument if you do not want to perform zero-shot evaluation on ImageNet throughout training. Note that the val folder should contain subfolders. If it does not, please use this script.

Note: the train_data should point to a *.csv file that contains the filelist with generated images in the following format: ÌMAGE_ID IMAGE_CAPTION, separated by '\t'. You can find the lists for our in-painted data under ./annotations.