|
--- |
|
library_name: transformers |
|
license: apache-2.0 |
|
datasets: |
|
- ds4sd/DocLayNet |
|
pipeline_tag: image-segmentation |
|
--- |
|
|
|
# DETR-layout-detection |
|
|
|
We present the model cmarkea/detr-layout-detection, which allows extracting different layouts (Text, Picture, Caption, Footnote, etc.) from an image of a document. |
|
This is a fine-tuning of the model [detr-resnet-50](https://huggingface.co/facebook/detr-resnet-50) on the [DocLayNet](https://huggingface.co/datasets/ds4sd/DocLayNet) |
|
dataset. This model can jointly predict masks and bounding boxes for documentary objects. It is ideal for processing documentary corpora to be ingested into an |
|
ODQA system. |
|
|
|
This model allows extracting 11 entities, which are: Caption, Footnote, Formula, List-item, Page-footer, Page-header, Picture, Section-header, Table, Text, and Title. |
|
|
|
## Performance |
|
|
|
In this section, we will evaluate the model's performance by separating semantic segmentation from object detection, with the understanding that no post-processing |
|
has been applied after estimation. |
|
|
|
### Semantic segmentation |
|
|
|
### Object detection |
|
|
|
## Direct Use |
|
|
|
```python |
|
from transformers import AutoImageProcessor |
|
from transformers.models.detr import DetrForSegmentation |
|
|
|
img_proc = AutoImageProcessor.from_pretrained( |
|
"ArkeaIAF/detr-layout-detection" |
|
) |
|
model = DetrForSegmentation.from_pretrained( |
|
"ArkeaIAF/detr-layout-detection" |
|
) |
|
|
|
with torch.inference_mode(): |
|
input_ids = img_proc(img, return_tensors='pt') |
|
output = model(**input_ids) |
|
|
|
threshold=0.4 |
|
|
|
segmentation_mask = img_proc.post_process_segmentation( |
|
out_seg, |
|
threshold=threshold, |
|
target_sizes=[img.size[::-1]] |
|
) |
|
|
|
bbox_pred = img_proc.post_process_object_detection( |
|
output, |
|
threshold=threshold, |
|
target_sizes=[img.size[::-1]] |
|
) |
|
``` |
|
|
|
### Citation |
|
|
|
``` |
|
@online{DeDetrLay, |
|
AUTHOR = {Cyrile Delestre}, |
|
URL = {https://huggingface.co/cmarkea/detr-base-layout-detection}, |
|
YEAR = {2024}, |
|
KEYWORDS = {Image Processing ; Transformers ; Layout}, |
|
} |
|
``` |