--- license: apache-2.0 language: - en --- # InterleavedBench This is the official huggingface repo for the paper "**Holistic Evaluation for Interleaved Text-and-Image Generation**". This is a research preview. More details, including the baseline models' prediction and so on, will be coming soon in the following few weeks. **Paper: https://arxiv.org/abs/2406.14643** **Website: https://vt-nlp.github.io/InterleavedEval/** ## How to use InterleavedBench ### Repo hierarchy - `interleaved_bench.json` is the main json file of the dataset. - `zipped_images` is the directory of zipped images for each subset, including the images for the context and ground truths. - `src/interleavedeval_gpt4o.py` is the python script for InterleavedEval with GPT-4o. Its input is the model prediction file. ### To get started - unzip the images files under `zipped_images`. - Run the inference on `interleaved_bench.json` with your model and get your model output (including text and image). - Use the script in `src/interleavedeval_gpt4o.py` to perform evaluation. ### Important notes - For image editing and subject-driven generation tasks, the scores on text-related aspects (text quality, text-image coherence) are directly set to 0. Please skip those scores when you compute the overall performance. One example in `interleaved_bench.json` is as follows: ``` { "id": "wikihow_next_step_0_489157", "image": [ "wiki_images_test/489157_0_0.png", "wiki_images_test/489157_0_1.png", "wiki_images_test/489157_0_2.png", "wiki_images_test/489157_0_3.png", "wiki_images_test/489157_0_4.png" ], "task_name": "wikihow_next_step", "conversations": [ { "from": "human", "value": "In this task, you are given a high-level goal 'How to Make a Banana Shake': Banana shakes are a tasty way to get a lot of nutrients all at once. Bananas provide a creamy, smooth texture when turned into a drink. Bananas also fill empty stomachs, staving off hunger pangs and giving you a nice energy burst. In this article you'll find a few ways to make banana shakes, among the many possibilities. \n You need to assist human user to complete this task via making a banana shake with kefir. Given the previous steps, you need to predict the subsequent 4 steps to help the user to finish the task. The previous steps are: \n Put 2 to 3 bananas in a bowl. \n" }, { "from": "gpt", "value": "Now put in a liter of kefir and a teaspoon of sugar. \n Put 1 cup milk into the mix. \n Using a blender, blend all ingredients together. \n Relax with your fresh banana smoothie! \n" } ], "goal": "How to Make a Banana Shake", "category": [ "Food and Entertaining", "Drinks", "Smoothies Shakes and Milk", "Fruit Based Shakes" ], "dataset_id": "wikihow_selected_test_uni" }, ``` ### Reference If you find our work useful or interesting, please cite: ``` @article{liu_holistic_2024, author = {Minqian Liu and Zhiyang Xu and Zihao Lin and Trevor Ashby and Joy Rimchala and Jiaxin Zhang and Lifu Huang}, title = {Holistic Evaluation for Interleaved Text-and-Image Generation}, journal = {CoRR}, volume = {abs/2406.14643}, year = {2024}, url = {https://doi.org/10.48550/arXiv.2406.14643}, doi = {10.48550/ARXIV.2406.14643}, eprinttype = {arXiv}, eprint = {2406.14643}, timestamp = {Tue, 16 Jul 2024 16:17:50 +0200} } ```