cnmoro commited on
Commit
d1037c0
·
verified ·
1 Parent(s): 4ab539f

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +60 -3
README.md CHANGED
@@ -1,3 +1,60 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ language:
4
+ - en
5
+ base_model:
6
+ - google/bert_uncased_L-4_H-256_A-4
7
+ - WinKawaks/vit-small-patch16-224
8
+ pipeline_tag: image-to-text
9
+ library_name: transformers
10
+ tags:
11
+ - vit
12
+ - bert
13
+ - vision
14
+ - caption
15
+ - captioning
16
+ - image
17
+ ---
18
+ An image captioning model, based on bert-mini and vit-small, weighing only 133mb!
19
+
20
+ Works very fast on CPU.
21
+
22
+ ```python
23
+ from transformers import AutoTokenizer, AutoImageProcessor, VisionEncoderDecoderModel
24
+ import requests, torch, time
25
+ from PIL import Image
26
+
27
+ model_path = "cnmoro/mini-image-captioning"
28
+ device = torch.device("cpu")
29
+
30
+ # load the image captioning model and corresponding tokenizer and image processor
31
+ model = VisionEncoderDecoderModel.from_pretrained(model_path)
32
+ tokenizer = AutoTokenizer.from_pretrained(model_path)
33
+ image_processor = AutoImageProcessor.from_pretrained(model_path)
34
+
35
+ # preprocess an image
36
+ url = "https://upload.wikimedia.org/wikipedia/commons/thumb/4/47/New_york_times_square-terabass.jpg/800px-New_york_times_square-terabass.jpg"
37
+ image = Image.open(requests.get(url, stream=True).raw)
38
+ pixel_values = image_processor(image, return_tensors="pt").pixel_values
39
+
40
+ start = time.time()
41
+
42
+ # generate caption - suggested settings
43
+ generated_ids = model.generate(
44
+     pixel_values,
45
+     temperature=0.7,
46
+     top_p=0.8,
47
+     top_k=50,
48
+     num_beams=3 # you can use 1 for even faster inference with a small drop in quality
49
+ )
50
+ generated_text = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
51
+
52
+ end = time.time()
53
+
54
+ print(generated_text)
55
+ # a large group of people walking through a busy city.
56
+
57
+ print(f"Time taken: {end - start} seconds")
58
+ # Time taken: 0.19002342224121094 seconds
59
+ # on CPU !
60
+ ```