Rodr16020 commited on
Commit
966536d
·
verified ·
1 Parent(s): e38021a

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +84 -16
README.md CHANGED
@@ -1,13 +1,18 @@
1
  ---
2
  library_name: transformers
3
- tags: []
 
 
 
 
4
  ---
5
 
6
  # Model Card for Model ID
7
 
8
- <!-- Provide a quick summary of what the model is/does. -->
9
-
10
-
 
11
 
12
  ## Model Details
13
 
@@ -17,21 +22,21 @@ tags: []
17
 
18
  This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.
19
 
20
- - **Developed by:** [More Information Needed]
21
  - **Funded by [optional]:** [More Information Needed]
22
  - **Shared by [optional]:** [More Information Needed]
23
- - **Model type:** [More Information Needed]
24
- - **Language(s) (NLP):** [More Information Needed]
25
  - **License:** [More Information Needed]
26
- - **Finetuned from model [optional]:** [More Information Needed]
27
 
28
  ### Model Sources [optional]
29
 
30
  <!-- Provide the basic links for the model. -->
31
 
32
- - **Repository:** [More Information Needed]
33
- - **Paper [optional]:** [More Information Needed]
34
- - **Demo [optional]:** [More Information Needed]
35
 
36
  ## Uses
37
 
@@ -39,7 +44,48 @@ This is the model card of a 🤗 transformers model that has been pushed on the
39
 
40
  ### Direct Use
41
 
42
- <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
43
 
44
  [More Information Needed]
45
 
@@ -92,7 +138,15 @@ Use the code below to get started with the model.
92
 
93
  #### Training Hyperparameters
94
 
95
- - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
 
 
 
 
 
 
 
 
96
 
97
  #### Speeds, Sizes, Times [optional]
98
 
@@ -158,15 +212,29 @@ Carbon emissions can be estimated using the [Machine Learning Impact calculator]
158
 
159
  ### Compute Infrastructure
160
 
161
- [More Information Needed]
 
 
162
 
163
  #### Hardware
164
 
165
- [More Information Needed]
 
 
 
 
 
 
166
 
167
  #### Software
168
 
169
- [More Information Needed]
 
 
 
 
 
 
170
 
171
  ## Citation [optional]
172
 
 
1
  ---
2
  library_name: transformers
3
+ language:
4
+ - en
5
+ - es
6
+ base_model:
7
+ - facebook/detr-resnet-101
8
  ---
9
 
10
  # Model Card for Model ID
11
 
12
+ DETR allows to detect and generate the bounding boxes for handwritten and cursive text. This model was finetuned using the base model facebook/detr-resnet-101.
13
+ The dataset used is still under development and possible released in future versions.
14
+ Mainly, the model detects spanish text.
15
+ Note: The default value of generated bounding boxes was used (num_queries: 100). Modifying this value when using the model could lead to unexpected behavior.
16
 
17
  ## Model Details
18
 
 
22
 
23
  This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.
24
 
25
+ - **Developed by:** Rodrigo Alvarez
26
  - **Funded by [optional]:** [More Information Needed]
27
  - **Shared by [optional]:** [More Information Needed]
28
+ - **Model type:** Text Detection / Bounding Box generation
29
+ - **Language(s) (NLP):** en (default), es-MX (finetuned)
30
  - **License:** [More Information Needed]
31
+ - **Finetuned from model [optional]:** facebook/detr-resnet-101
32
 
33
  ### Model Sources [optional]
34
 
35
  <!-- Provide the basic links for the model. -->
36
 
37
+ - **Repository:** [https://github.com/rodrigoalvarez-20/detr_trocr_handwritten_text/development](DETR TROCR Lab)
38
+ - **Paper [optional]:** *Work in progress*
39
+ - **Demo [optional]:** [https://github.com/rodrigoalvarez-20/detr_trocr_handwritten_text/blob/development/detr_lab.ipynb](Demo)
40
 
41
  ## Uses
42
 
 
44
 
45
  ### Direct Use
46
 
47
+ ```python
48
+ from transformers import DetrForObjectDetection, DetrImageProcessor
49
+ import torch
50
+ import cv2
51
+ import supervision as sv
52
+ # User defined constants
53
+ MODEL_CHECKPOINT = "Rodr16020/detr_handwriten_cursive_text_detection"
54
+ DEVICE = "cuda"
55
+ CONFIDENCE_TRESHOLD = 0.5 # This parameter allows to filter the generated boxes with a confidence score >= to this value
56
+ IOU_TRESHOLD = 0.5
57
+ TEST_IMAGE = "demo.jpeg" # Path to the test image
58
+ #Load the model and preprocessor
59
+ img_proc = DetrImageProcessor.from_pretrained(MODEL_CHECKPOINT)
60
+ detr_model = DetrForObjectDetection.from_pretrained(
61
+ pretrained_model_name_or_path=MODEL_CHECKPOINT,
62
+ ignore_mismatched_sizes=True
63
+ ).to(DEVICE)
64
+ # Get the pixel values of the image (matrix)
65
+ image = cv2.imread(TEST_IMAGE)
66
+ # inference
67
+ with torch.no_grad():
68
+ # load image and predict
69
+ inputs = img_proc(images=image, return_tensors='pt').to(DEVICE)
70
+ outputs = detr_model(**inputs)
71
+ # post-process
72
+ # Resize the generated Bounding Boxes coords to the image original size
73
+ target_sizes = torch.tensor([image.shape[:2]]).to(DEVICE)
74
+ results = img_proc.post_process_object_detection(
75
+ outputs=outputs,
76
+ threshold=CONFIDENCE_TRESHOLD,
77
+ target_sizes=target_sizes
78
+ )[0]
79
+
80
+ # To extract all the generated bboxes
81
+ boxes = results["boxes"].tolist()[0]
82
+ # With supervision lib, use the generated coords to annotate the image and preview the boxes
83
+ box_annotator = sv.BoxAnnotator()
84
+ detections = sv.Detections.from_transformers(transformers_results=results).with_nms(threshold=0.1)
85
+ labels = [f"{confidence:.2f}" for _,_, confidence, class_id, _ in detections]
86
+ frame = box_annotator.annotate(scene=image.copy(), detections=detections, labels=labels)
87
+ sv.plot_image(frame, (16, 16))
88
+ ```
89
 
90
  [More Information Needed]
91
 
 
138
 
139
  #### Training Hyperparameters
140
 
141
+ - Dataset Format: COCO
142
+ - Device: CUDA
143
+ - WEIGHT_DECAY = 3e-3
144
+ - CLIP_GRAD = 1e-4 #0.001
145
+ - BATCH_SIZE = 8
146
+ - ACC_BATCH = BATCH_SIZE * 4
147
+ - MODEL_LR = 5e-4 # In some articles, they set the value to 5e-4, but, in my case, it doesn't work, so I try with this and works "well"
148
+ - BB_LR = 5e-4 # Same as above
149
+ - MAX_EPOCHS = 300 # Use >= 50 . But it stops learning near the step 70
150
 
151
  #### Speeds, Sizes, Times [optional]
152
 
 
212
 
213
  ### Compute Infrastructure
214
 
215
+ A simple and a tiny computer at CIC research lab.
216
+
217
+ When finetuning, the model and data used a total of
218
 
219
  #### Hardware
220
 
221
+ - ASRock-placa base Z370/OEM
222
+ - Gabinete Corsair 4000D Airflow
223
+ - Procesador Intel Core i7 i7-8700K
224
+ - Memoria RAM XPG Spectrix DDR4, 3200MHz, 16GB (x4)
225
+ - SSD Externo Western Digital WD My Passport, 1TB
226
+ - NVIDIA GeForce RTX 4090 24GB
227
+ - Corsair Serie RMX, RM1000x, 1000 W
228
 
229
  #### Software
230
 
231
+ - transformers
232
+ - pytorch
233
+ - tensorboard
234
+ - cv2
235
+ - supervision
236
+
237
+ And possibly others
238
 
239
  ## Citation [optional]
240