|
--- |
|
license: apache-2.0 |
|
language: |
|
- en |
|
--- |
|
# InterleavedBench (EMNLP'24 Main Conference) |
|
|
|
This is the official huggingface repo for the paper "**Holistic Evaluation for Interleaved Text-and-Image Generation**" accepted in EMNLP 2024 Main Conference. |
|
|
|
**Paper: https://arxiv.org/abs/2406.14643** |
|
|
|
**Website: https://vt-nlp.github.io/InterleavedEval/** |
|
|
|
## How to use InterleavedBench |
|
|
|
### Repo hierarchy |
|
|
|
- `interleaved_bench.json` is the main json file of the dataset. |
|
- `zipped_images` is the directory of zipped images for each subset, including the images for the context and ground truths. |
|
- `src/interleavedeval_gpt4o.py` is the python script for InterleavedEval with GPT-4o. Its input is the model prediction file. |
|
|
|
### To get started |
|
|
|
- unzip the images files under `zipped_images`. |
|
- Run the inference on `interleaved_bench.json` with your model and get your model output (including text and image). |
|
- Use the script in `src/interleavedeval_gpt4o.py` to perform evaluation. |
|
|
|
### Important notes |
|
- For image editing and subject-driven generation tasks, the scores on text-related aspects (text quality, text-image coherence) are directly set to 0. Please skip those scores when you compute the overall performance. |
|
|
|
|
|
One example in `interleaved_bench.json` is as follows: |
|
|
|
``` |
|
{ |
|
"id": "wikihow_next_step_0_489157", |
|
"image": [ |
|
"wiki_images_test/489157_0_0.png", |
|
"wiki_images_test/489157_0_1.png", |
|
"wiki_images_test/489157_0_2.png", |
|
"wiki_images_test/489157_0_3.png", |
|
"wiki_images_test/489157_0_4.png" |
|
], |
|
"task_name": "wikihow_next_step", |
|
"conversations": [ |
|
{ |
|
"from": "human", |
|
"value": "In this task, you are given a high-level goal 'How to Make a Banana Shake': Banana shakes are a tasty way to get a lot of nutrients all at once. Bananas provide a creamy, smooth texture when turned into a drink. Bananas also fill empty stomachs, staving off hunger pangs and giving you a nice energy burst. In this article you'll find a few ways to make banana shakes, among the many possibilities. \n You need to assist human user to complete this task via making a banana shake with kefir. Given the previous steps, you need to predict the subsequent 4 steps to help the user to finish the task. The previous steps are: \n <BEGIN> Put 2 to 3 bananas in a bowl. <image>\n" |
|
}, |
|
{ |
|
"from": "gpt", |
|
"value": "Now put in a liter of kefir and a teaspoon of sugar. <image>\n Put 1 cup milk into the mix. <image>\n Using a blender, blend all ingredients together. <image>\n Relax with your fresh banana smoothie! <image>\n" |
|
} |
|
], |
|
"goal": "How to Make a Banana Shake", |
|
"category": [ |
|
"Food and Entertaining", |
|
"Drinks", |
|
"Smoothies Shakes and Milk", |
|
"Fruit Based Shakes" |
|
], |
|
"dataset_id": "wikihow_selected_test_uni" |
|
}, |
|
``` |
|
|
|
### Reference |
|
|
|
If you find our work useful or interesting, please cite: |
|
``` |
|
@article{liu_holistic_2024, |
|
author = {Minqian Liu and |
|
Zhiyang Xu and |
|
Zihao Lin and |
|
Trevor Ashby and |
|
Joy Rimchala and |
|
Jiaxin Zhang and |
|
Lifu Huang}, |
|
title = {Holistic Evaluation for Interleaved Text-and-Image Generation}, |
|
journal = {CoRR}, |
|
volume = {abs/2406.14643}, |
|
year = {2024}, |
|
url = {https://doi.org/10.48550/arXiv.2406.14643}, |
|
doi = {10.48550/ARXIV.2406.14643}, |
|
eprinttype = {arXiv}, |
|
eprint = {2406.14643}, |
|
timestamp = {Tue, 16 Jul 2024 16:17:50 +0200} |
|
} |
|
``` |