Jae-Won Chung
New leaderboard prototype
b10121d
|
raw
history blame
807 Bytes

Benchmarking dataset

500 prompt/image pairs were sampled from the conversation subset of the Llava-Instruct dataset.

Obtaining and filtering the dataset

First, download the full conversatio dataset with 58k samples:

curl -L https://huggingface.co/datasets/liuhaotian/LLaVA-Instruct-150K/resolve/9d451dc7629cfe0469f6ae4432b765cd603d5fcb/conversation_58k.json -o full.json

Also fetch all COCO trainset and extract images:

curl -LO http://images.cocodataset.org/zips/train2017.zip
unzip train2017.zip

Finally, run the script to (1) sample 500 pairs, (2) read in the corresponding COCO image, (3) encode images into base64 strings, and (4) merge all pairs into one JSON file. After this step, original dataset files (Llava and COCO) may be deleted.

python sample.py