ucsahin commited on
Commit
7bb28ce
·
verified ·
1 Parent(s): ff204b0

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +126 -5
README.md CHANGED
@@ -23,17 +23,138 @@ This model is a fine-tuned version of [ucsahin/TraVisionLM-base](https://hugging
23
  It achieves the following results on the evaluation set:
24
  - Loss: 2.2919
25
 
 
 
 
 
 
 
 
 
 
 
26
  ## Model description
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
27
 
28
- More information needed
 
 
 
 
 
29
 
30
- ## Intended uses & limitations
31
 
32
- More information needed
33
 
34
- ## Training and evaluation data
35
 
36
- More information needed
37
 
38
  ## Training procedure
39
 
 
23
  It achieves the following results on the evaluation set:
24
  - Loss: 2.2919
25
 
26
+
27
+ 📝𝗘𝗡 **You can find the training script at [Google Colab](https://colab.research.google.com/drive/1XXnG9QXOvuNtWkr9OPkx0GbnPLYGF-rV?usp=sharing)**
28
+
29
+ 🤖𝗘𝗡 **For all the details about the base model, please check out [ucsahin/TraVisionLM-base](https://huggingface.co/ucsahin/TraVisionLM-base)**
30
+
31
+
32
+ 📝🇹🇷 **Eğitimin nasıl yapıldığını gösteren Colab defterine buradan ulaşabilirsiniz: [Google Colab](https://colab.research.google.com/drive/1XXnG9QXOvuNtWkr9OPkx0GbnPLYGF-rV?usp=sharing)**
33
+
34
+ 🤖🇹🇷 **Base model hakkındaki bütün detaylar için: [ucsahin/TraVisionLM-base](https://huggingface.co/ucsahin/TraVisionLM-base)**
35
+
36
  ## Model description
37
+ # English
38
+ This object detection model is a fine-tuned version of [ucsahin/TraVisionLM-base](https://huggingface.co/ucsahin/TraVisionLM-base) using the ```Trainer``` class from the ```Transformers``` library.
39
+ It shows that you can finetune the base model in any downstream task and make the model to acquire the ability to complete the task.
40
+
41
+ ⚠️**Note**: *Object detection is a complex task that demands extensive, high-quality training data. This model has been trained on a dataset of 150K samples and is currently designed to detect a single object per image. As a result, it may sometimes produce inaccurate results.*
42
+ *The initial findings indicate that significantly more training data will be necessary to develop a more reliable object detection model. However, the results achieved so far demonstrate a promising foundation.*
43
+
44
+ The examples of the task prompts used are ```"İşaretle: *object_name*", "Tespit et: *object_name*"```.
45
+
46
+ # Turkish
47
+ Bu nesne tespiti modeli, ```Transformers``` kütüphanesinden ```Trainer``` sınıfı kullanılarak [ucsahin/TraVisionLM-base](https://huggingface.co/ucsahin/TraVisionLM-base) modelinin ince ayar yapılmış bir versiyonudur. Bu model, temel modeli herhangi bir alt görevde ince ayar yaparak, modele bu görevi tamamlama yeteneği kazandırabileceğinizi göstermektedir.
48
+
49
+ ⚠️**Not**: *Nesne tespiti, yüksek kaliteli ve büyük miktarda eğitim verisi gerektiren karmaşık bir görevdir. Bu model, 150K veri örneği ile eğitilmiştir ve şu anda bir görüntüde yalnızca tek bir nesneyi tespit edecek şekilde tasarlanmıştır. Bu nedenle, bazen hatalı sonuçlar üretebilir.*
50
+ *İlk bulgular, daha güvenilir bir nesne tespiti modeli geliştirmek için çok daha fazla eğitim verisi gerekeceğini göstermektedir. Ancak, şu ana kadar elde edilen sonuçlar umut verici bir başlangıç noktası sunmaktadır.*
51
+
52
+ Kullanılan görev istemi örnekleri şunlardır: ```"İşaretle: *nesne_adı*", "Tespit et: *nesne_adı*"```.
53
+
54
+
55
+ You can easily load the model and do inference using the ```Transformers``` library:
56
+
57
+ ```python
58
+ from transformers import AutoModelForCausalLM, AutoProcessor
59
+ import torch
60
+ import requests
61
+ from PIL import Image
62
+
63
+ model = AutoModelForCausalLM.from_pretrained('ucsahin/TraVisionLM-Object-Detection-ft', trust_remote_code=True, device_map="cuda")
64
+ # you can also load the model in bfloat16 or float16
65
+ # model = AutoModelForCausalLM.from_pretrained('ucsahin/TraVisionLM-base', trust_remote_code=True, torch_dtype=torch.bfloat16, device_map="cuda")
66
+ processor = AutoProcessor.from_pretrained('ucsahin/TraVisionLM-Object-Detection-ft', trust_remote_code=True)
67
+
68
+ url = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/tasks/car.jpg"
69
+ image = Image.open(requests.get(url, stream=True).raw).convert("RGB")
70
+
71
+ prompt = "İşaretle: araba"
72
+ # prompt = "Tespit et: araba"
73
+
74
+ inputs = processor(text=prompt, images=image, return_tensors="pt").to("cuda")
75
+
76
+ outputs = model.generate(**inputs, max_new_tokens=512, do_sample=True, temperature=0.6, top_p=0.9, top_k=50, repetition_penalty=1.2)
77
+
78
+ output_text = processor.batch_decode(outputs, skip_special_tokens=True)[0]
79
+
80
+ print("Model response: ", output_text)
81
+ """
82
+ Model response: İşaretle: araba
83
+ <loc0048><loc0338><loc0912><loc0819> araba;
84
+ """
85
+ ```
86
+
87
+ 𝗘𝗡 The model has special tokens for bounding box locations: ``"<loc0000>, <loc0001>, ..., <loc1024>"``. The bounding box coordinates are in the form of special <loc[value]> tokens, where value is a number that represents a normalized coordinate. Each detection is represented by four location coordinates in the order x_min(left), y_min(top), x_max(right), y_max(bottom), followed by the label that was detected in that box. To convert values to coordinates, you first need to divide the numbers by 1024, then multiply y by the image height and x by its width. This will give you the coordinates of the bounding boxes, relative to the original image size.
88
+
89
+ 🇹🇷 Model, sınırlayıcı kutu konumları için özel tokenlara sahiptir: ``"<loc0000>, <loc0001>, ..., <loc1024>"``. Sınırlayıcı kutu koordinatları, <loc[value]> şeklindeki özel tokenlar ile belirtilir ve bu tokenlardaki value değeri, normalize edilmiş bir koordinatı temsil eden bir sayıdır. Her bir tespit, sırasıyla x_min(sol), y_min(üst), x_max(sağ), y_max(alt) şeklinde dört konum koordinatı ile bu kutuda tespit edilen etiketle temsil edilir. Değerleri koordinatlara dönüştürmek için önce sayıları 1024’e bölmeniz, ardından y’yi görüntü yüksekliğiyle ve x’i genişliğiyle çarpmanız gerekir. Bu, sınırlayıcı kutuların orijinal görüntü boyutuna göre koordinatlarını verir.
90
+
91
+ For the post-processing of the bounding box and plotting it on the image, you can use the following:
92
+ ```python
93
+ import matplotlib.pyplot as plt
94
+ import matplotlib.patches as patches
95
+ import re
96
+
97
+ plt.rcParams['font.family'] = 'DejaVu Sans'
98
+
99
+ def plot_bbox(image, labels, bboxes):
100
+ # Create a figure and axes
101
+ fig, ax = plt.subplots()
102
+
103
+ # Display the image
104
+ ax.imshow(image)
105
+
106
+ # Plot each bounding box
107
+ for bbox, label in zip(bboxes, labels):
108
+ # Unpack the bounding box coordinates
109
+ x1, y1, x2, y2 = bbox
110
+ # Create a Rectangle patch
111
+ rect = patches.Rectangle((x1, y1), x2-x1, y2-y1, linewidth=1, edgecolor='r', facecolor='none')
112
+ # Add the rectangle to the Axes
113
+ ax.add_patch(rect)
114
+ # Annotate the label
115
+ plt.text(x1, y1, label, color='white', fontsize=8, bbox=dict(facecolor='red', alpha=0.5))
116
+
117
+ # Remove the axis ticks and labels
118
+ ax.axis('off')
119
+
120
+ # Show the plot
121
+ plt.show()
122
+
123
+ def extract_loc_values_and_labels(bbox_str, width, height):
124
+ bbox_label_pairs = re.findall(r'((?:<loc\d+>){4})\s*([\w\s]+)', bbox_str)
125
+
126
+ bboxes = []
127
+ labels = []
128
+
129
+ for bbox, label in bbox_label_pairs:
130
+ loc_values = re.findall(r'<loc(\d+)>', bbox)
131
+ loc_values = [int(x) for x in loc_values]
132
+ loc_values = [value/1024 for value in loc_values]
133
+ # convert to PASCAL VOC format
134
+ loc_values = [
135
+ int(loc_values[0] * width), int(loc_values[1] * height),
136
+ int(loc_values[2] * width), int(loc_values[3] * height),
137
+ ]
138
+ bboxes.append(loc_values)
139
+ labels.append(label)
140
+
141
+ return bboxes, labels
142
+ ```
143
+
144
+ Then,
145
 
146
+ ```python
147
+ bboxes, labels = extract_loc_values_and_labels(output_text, image.width, image.height)
148
+ print("bboxes: ", bboxes)
149
+ print("labels: ", labels)
150
+ plot_bbox(image, labels, bboxes)
151
+ ```
152
 
153
+ and the final result will be
154
 
155
+ ![alt text](detected_car.png "Title")
156
 
 
157
 
 
158
 
159
  ## Training procedure
160