Spaces:
Runtime error
Runtime error
# Visual Question Answering and Image Captioning using BLIP and OpenVINO | |
[BLIP](https://arxiv.org/abs/2201.12086) is a pre-training framework for unified vision-language understanding and generation, which achieves state-of-the-art results on a wide range of vision-language tasks. | |
This tutorial considers ways to use BLIP for visual question answering and image captioning. | |
The complete pipeline of this demo is shown below: | |
## Image Captioning | |
<p align="center"> | |
<img src="https://user-images.githubusercontent.com/29454499/221865836-a56da06e-196d-449c-a5dc-4136da6ab5d5.png"/> | |
</p> | |
The following image shows an example of the input image and generated caption: | |
<p align="center"> | |
<img src="https://user-images.githubusercontent.com/29454499/221933471-5c06cc51-073c-48af-b514-bddce1a89aaa.png"/> | |
</p> | |
## Visual Question Answering | |
<p align="center"> | |
<img src="https://user-images.githubusercontent.com/29454499/221868167-d0081add-d9f3-4591-80e7-4753c88c1d0a.png"/> | |
</p> | |
The following image shows an example of the input image, question and answer generated by model | |
<p align="center"> | |
<img src="https://user-images.githubusercontent.com/29454499/221933762-4ff32ecb-5e5d-4484-80e1-e9396cb3c511.png"/> | |
</p> | |
## Notebook Contents | |
This folder contains notebook that show how to convert and optimize model with OpenVINO: | |
The tutorial consists of the following parts: | |
1. Instantiate a BLIP model. | |
2. Convert the BLIP model to OpenVINO IR. | |
3. Run visual question answering and image captioning with OpenVINO. | |
4. Optimize BLIP model using NNCF | |
5. Compare original and optimized models | |
6. Launch interactive demo | |
## Installation Instructions | |
This is a self-contained example that relies solely on its own code.</br> | |
We recommend running the notebook in a virtual environment. You only need a Jupyter server to start. | |
For details, please refer to [Installation Guide](../../README.md). | |