How to fine tune microsoft/table-transformer-detection in huggingface?

#16
by Spondon - opened

Dear All,

After reading all the threads available in the internet I am using below script to fine tune table-transformer-detection
https://github.com/NielsRogge/Transformers-Tutorials/blob/master/DETR/Fine_tuning_DetrForObjectDetection_on_custom_dataset_(balloon).ipynb

I have
Replace:

processor = DetrImageProcessor.from_pretrained("facebook/detr-resnet-50")

for:

processor = DetrImageProcessor()

also
Replace:

DetrForObjectDetection.from_pretrained("facebook/detr-resnet-50",
revision="no_timm",
num_labels=len(id2label),
ignore_mismatched_sizes=True)
for:

TableTransformerForObjectDetection.from_pretrained(
"microsoft/table-transformer-structure-recognition",
ignore_mismatched_sizes=True,
)

After finetuning the model using Trainer(max_steps=3000, gradient_clip_val=0.1) I am getting very low accuracy below
IoU metric: bbox
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.334
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.539
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.356
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = -1.000
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = -1.000
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.334
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.223
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.468
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.487
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = -1.000
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = -1.000
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.487

My dataset size is:
Number of training examples: 159
Number of validation examples: 19

Any thoughts on this ?
P.S. I know about this project https://github.com/microsoft/table-transformer/ , I know how to finetune using this project , I also know about convert_table_transformer_original_pytorch_checkpoint_to_pytorch.py, present is transformers repo. But my question is why the above finetune is giving me such a low accuracy ? should I increase my dataset size or am I missing anything. I am using a proprietary dataset

Hi @Spondon . How did you pre-processed(create dataset) your own custom data ? Any Code or links for it ?
Thanks.

Hi @mali17361 ,
Code is proprietary, but the dataset format is COCO.

Interestingly when I convert the dataset to PASCAL VOC format and finetune using table transformer source script (https://github.com/microsoft/table-transformer/) , I got below accuracy on 16 epochs

IoU metric: bbox
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.823
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.959
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.880
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = -1.000
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = -1.000
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.823
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.533
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.877
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.887
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = -1.000
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = -1.000
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.887
pubmed: AP50: 0.959, AP75: 0.880, AP: 0.823, AR: 0.887

Any thoughts on this?

Hi @Spondon . Thanks for the reply.
I was able to convert my own dataset to COCO and create my own custom dataset and fine tune the model.
I'm aslo getting very low Accuracy scores.
May be we lack the size of the dataset. What do you think ?
Have you tried any other method ?
Also I have issues when it comes to the output classes as the model has 6 output classes and in the balloon dataset case it is only 1 output class and in my case there are 2 output classes. When I'm inferencing on outside data, I see the outputs are coming only in 1 or 2 tensors i.e., only 2 classes rather than all the classes as "table-structure-recognition" has 6 output classes.

Hi @Spondon ,

We just updated our object detection guide (for easier mAP calculation with the Trainer API): https://huggingface.co/docs/transformers/main/en/tasks/object_detection, and we now also added official object detection scripts (both with Trainer API and Accelerate): https://github.com/huggingface/transformers/tree/main/examples/pytorch/object-detection.

Definitely recommend these guides for fine-tuning Table Transformer on a custom dataset.

Hi @Spondon , @mali17361

Can you guys tell me, how did you create custom dataset? I am not asking for the code just the general information about the dataset labeling tool and structure would help. I want to create a custom dataset for fine-tuning. can you also tell me the minimum number of tables needed to finetune table transformer model? your input would help alot!

Thanks

Hi @icecandyman

I created my dataset using Label Studio based on this template : https://labelstud.io/templates/image_bbox

You can customize the labels as you wish, here's my template if it can help :

<View>
  <View style="display:flex; align-items:start; flex-direction:row; gap:8px;">
    <View display="block" style="width:100%;">
      <Image name="image" value="$ocr" zoom="true" width="100%" height="100%" zoomControl="false" rotateControl="false" horizontalAlignment="center" crosshair="true"/>
    </View>
   </View>

  <RectangleLabels name="label" toName="image">
    <Label value="table" background="green"/>
    <Label value="header" background="red"/>
    <Label value="row" background="blue"/>
    <Label value="column" background="yellow"/>
  </RectangleLabels>
</View>

Once finished, you can export in COCO or PASCAL VOC format.

Hi @Paul21777
Thanks for the reply!
What was the structure of your dataset? and which method/script did you use to fine-tune as there are multiple and the src script here https://github.com/microsoft/table-transformer/ is not working for me! it is giving import issues!

@icecandyman
I'm currently working on it following https://huggingface.co/docs/transformers/tasks/object_detection . From there you can found the format expected by Table Transformer (in the link it is done for DETR but it works the same for TATR, we just have to change the model name and the way images are processed).

 The examples in the dataset have the following fields:

    image_id: the example image id
    image: a PIL.Image.Image object containing the image
    width: width of the image
    height: height of the image
    objects: a dictionary containing bounding box metadata for the objects in the image:
        id: the annotation id
        area: the area of the bounding box
        bbox: the object’s bounding box (in the COCO format )
        category: the object’s category, with possible values including Coverall (0), Face_Shield (1), Gloves (2), Goggles (3) and Mask (4)

Apparently, even the annotations containing the bounding boxes in COCO format that Label Studio allows us to export are not directly adapted for the training. I'm currently writing the script to convert it, I will give send it here ASAP if you want

@Paul21777
Hi, probably i need that script to convert for my current dataset for fine-tune this model. Could you send/share it?

Sign up or log in to comment