coyotte508's picture
coyotte508 HF staff
🍱 Copy folders from huggingface.js
b2ecf7d
|
raw
history blame
3.54 kB

Use Cases

Document Question Answering models can be used to answer natural language questions about documents. Typically, document QA models consider textual, layout and potentially visual information. This is useful when the question requires some understanding of the visual aspects of the document. Nevertheless, certain document QA models can work without document images. Hence the task is not limited to visually-rich documents and allows users to ask questions based on spreadsheets, text PDFs, etc!

Document Parsing

One of the most popular use cases of document question answering models is the parsing of structured documents. For example, you can extract the name, address, and other information from a form. You can also use the model to extract information from a table, or even a resume.

Invoice Information Extraction

Another very popular use case is invoice information extraction. For example, you can extract the invoice number, the invoice date, the total amount, the VAT number, and the invoice recipient.

Inference

You can infer with Document QA models with the πŸ€— Transformers library using the document-question-answering pipeline. If no model checkpoint is given, the pipeline will be initialized with impira/layoutlm-document-qa. This pipeline takes question(s) and document(s) as input, and returns the answer.
πŸ‘‰ Note that the question answering task solved here is extractive: the model extracts the answer from a context (the document).

from transformers import pipeline
from PIL import Image

pipe = pipeline("document-question-answering", model="naver-clova-ix/donut-base-finetuned-docvqa")

question = "What is the purchase amount?"
image = Image.open("your-document.png")

pipe(image=image, question=question)

## [{'answer': '20,000$'}]

Useful Resources

Would you like to learn more about Document QA? Awesome! Here are some curated resources that you may find helpful!

Notebooks

Documentation

The contents of this page are contributed by Eliott Zemour and reviewed by Kwadwo Agyapon-Ntra and Ankur Goyal.