Rodr16020
/

detr_handwriten_cursive_text_detection

@@ -1,13 +1,18 @@
 ---
 library_name: transformers
-tags: []
 ---
 # Model Card for Model ID
-<!-- Provide a quick summary of what the model is/does. -->
 ## Model Details
@@ -17,21 +22,21 @@ tags: []
 This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.
-- **Developed by:** [More Information Needed]
 - **Funded by [optional]:** [More Information Needed]
 - **Shared by [optional]:** [More Information Needed]
-- **Model type:** [More Information Needed]
-- **Language(s) (NLP):** [More Information Needed]
 - **License:** [More Information Needed]
-- **Finetuned from model [optional]:** [More Information Needed]
 ### Model Sources [optional]
 <!-- Provide the basic links for the model. -->
-- **Repository:** [More Information Needed]
-- **Paper [optional]:** [More Information Needed]
-- **Demo [optional]:** [More Information Needed]
 ## Uses
@@ -39,7 +44,48 @@ This is the model card of a 🤗 transformers model that has been pushed on the
 ### Direct Use
-<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
 [More Information Needed]
@@ -92,7 +138,15 @@ Use the code below to get started with the model.
 #### Training Hyperparameters
-- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
 #### Speeds, Sizes, Times [optional]
@@ -158,15 +212,29 @@ Carbon emissions can be estimated using the [Machine Learning Impact calculator]
 ### Compute Infrastructure
-[More Information Needed]
 #### Hardware
-[More Information Needed]
 #### Software
-[More Information Needed]
 ## Citation [optional]

 ---
 library_name: transformers
+language:
+- en
+- es
+base_model:
+- facebook/detr-resnet-101
 ---
 # Model Card for Model ID
+DETR allows to detect and generate the bounding boxes for handwritten and cursive text. This model was finetuned using the base model facebook/detr-resnet-101.
+The dataset used is still under development and possible released in future versions.
+Mainly, the model detects spanish text.
+Note: The default value of generated bounding boxes was used (num_queries: 100). Modifying this value when using the model could lead to unexpected behavior.
 ## Model Details
 This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.
+- **Developed by:** Rodrigo Alvarez
 - **Funded by [optional]:** [More Information Needed]
 - **Shared by [optional]:** [More Information Needed]
+- **Model type:** Text Detection / Bounding Box generation
+- **Language(s) (NLP):** en (default), es-MX (finetuned)
 - **License:** [More Information Needed]
+- **Finetuned from model [optional]:** facebook/detr-resnet-101
 ### Model Sources [optional]
 <!-- Provide the basic links for the model. -->
+- **Repository:** [https://github.com/rodrigoalvarez-20/detr_trocr_handwritten_text/development](DETR TROCR Lab)
+- **Paper [optional]:** *Work in progress*
+- **Demo [optional]:** [https://github.com/rodrigoalvarez-20/detr_trocr_handwritten_text/blob/development/detr_lab.ipynb](Demo)
 ## Uses
 ### Direct Use
+```python
+from transformers import DetrForObjectDetection, DetrImageProcessor
+import torch
+import cv2
+import supervision as sv
+# User defined constants
+MODEL_CHECKPOINT = "Rodr16020/detr_handwriten_cursive_text_detection"
+DEVICE = "cuda"
+CONFIDENCE_TRESHOLD = 0.5 # This parameter allows to filter the generated boxes with a confidence score >= to this value
+IOU_TRESHOLD = 0.5
+TEST_IMAGE = "demo.jpeg" # Path to the test image
+#Load the model and preprocessor
+img_proc = DetrImageProcessor.from_pretrained(MODEL_CHECKPOINT)
+detr_model = DetrForObjectDetection.from_pretrained(
+    pretrained_model_name_or_path=MODEL_CHECKPOINT,
+    ignore_mismatched_sizes=True
+).to(DEVICE)
+# Get the pixel values of the image (matrix)
+image = cv2.imread(TEST_IMAGE)
+# inference
+with torch.no_grad():
+    # load image and predict
+    inputs = img_proc(images=image, return_tensors='pt').to(DEVICE)
+    outputs = detr_model(**inputs)
+    # post-process
+    # Resize the generated Bounding Boxes coords to the image original size
+    target_sizes = torch.tensor([image.shape[:2]]).to(DEVICE)
+    results = img_proc.post_process_object_detection(
+        outputs=outputs,
+        threshold=CONFIDENCE_TRESHOLD,
+        target_sizes=target_sizes
+    )[0]
+# To extract all the generated bboxes
+boxes = results["boxes"].tolist()[0]
+# With supervision lib, use the generated coords to annotate the image and preview the boxes
+box_annotator = sv.BoxAnnotator()
+detections = sv.Detections.from_transformers(transformers_results=results).with_nms(threshold=0.1)
+labels = [f"{confidence:.2f}" for _,_, confidence, class_id, _ in detections]
+frame = box_annotator.annotate(scene=image.copy(), detections=detections, labels=labels)
+sv.plot_image(frame, (16, 16))
+```
 [More Information Needed]
 #### Training Hyperparameters
+- Dataset Format: COCO
+- Device: CUDA
+- WEIGHT_DECAY = 3e-3
+- CLIP_GRAD = 1e-4 #0.001
+- BATCH_SIZE = 8
+- ACC_BATCH = BATCH_SIZE * 4
+- MODEL_LR = 5e-4 # In some articles, they set the value to 5e-4, but, in my case, it doesn't work, so I try with this and works "well"
+- BB_LR = 5e-4 # Same as above
+- MAX_EPOCHS = 300 # Use >= 50 . But it stops learning near the step 70
 #### Speeds, Sizes, Times [optional]
 ### Compute Infrastructure
+A simple and a tiny computer at CIC research lab.
+When finetuning, the model and data used a total of
 #### Hardware
+- ASRock-placa base Z370/OEM
+- Gabinete Corsair 4000D Airflow
+- Procesador Intel Core i7 i7-8700K
+- Memoria RAM XPG Spectrix DDR4, 3200MHz, 16GB (x4)
+- SSD Externo Western Digital WD My Passport, 1TB
+- NVIDIA GeForce RTX 4090 24GB
+- Corsair Serie RMX, RM1000x, 1000 W
 #### Software
+- transformers
+- pytorch
+- tensorboard
+- cv2
+- supervision
+And possibly others
 ## Citation [optional]