Spaces:

huggingfacejs
/

inference-widgets

Running on CPU Upgrade

App Files Files Community

inference-widgets / packages /tasks /src /visual-question-answering /about.md

coyotte508 HF Staff

🍱 Copy folders from huggingface.js

b2ecf7d over 1 year ago

preview code

raw

history blame

1.95 kB

	## Use Cases

	### Aid the Visually Impaired Persons

	VQA models can be used to reduce visual barriers for visually impaired individuals by allowing them to get information about images from the web and the real world.

	### Education

	VQA models can be used to improve experiences at museums by allowing observers to directly ask questions they interested in.

	### Improved Image Retrieval

	Visual question answering models can be used to retrieve images with specific characteristics. For example, the user can ask "Is there a dog?" to find all images with dogs from a set of images.

	### Video Search

	Specific snippets/timestamps of a video can be retrieved based on search queries. For example, the user can ask "At which part of the video does the guitar appear?" and get a specific timestamp range from the whole video.

	## Task Variants

	### Video Question Answering

	Video Question Answering aims to answer questions asked about the content of a video.

	## Inference

	You can infer with Visual Question Answering models using the `vqa` (or `visual-question-answering`) pipeline. This pipeline requires [the Python Image Library (PIL)](https://pillow.readthedocs.io/en/stable/) to process images. You can install it with (`pip install pillow`).

	```python
	from PIL import Image
	from transformers import pipeline

	vqa_pipeline = pipeline("visual-question-answering")

	image = Image.open("elephant.jpeg")
	question = "Is there an elephant?"

	vqa_pipeline(image, question, top_k=1)
	#[{'score': 0.9998154044151306, 'answer': 'yes'}]
	```

	## Useful Resources

	- [An introduction to Visual Question Answering - AllenAI](https://blog.allenai.org/vanilla-vqa-adcaaaa94336)
	- [Multi Modal Framework (MMF) - Meta Research](https://mmf.sh/docs/getting_started/video_overview/)

	The contents of this page are contributed by [
	Bharat Raghunathan](https://huggingface.co/bharat-raghunathan) and [Jose Londono Botero](https://huggingface.co/jlondonobo).