Sachinthaka Abeywardana
commited on
Commit
·
e041481
1
Parent(s):
8f21e3d
readme
Browse files
README.md
ADDED
|
@@ -0,0 +1,37 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
language:
|
| 3 |
+
- en
|
| 4 |
+
tags:
|
| 5 |
+
- image-to-text
|
| 6 |
+
license: mit
|
| 7 |
+
datasets:
|
| 8 |
+
- coco2017
|
| 9 |
+
---
|
| 10 |
+
|
| 11 |
+
# Vit2-DistilGPT2
|
| 12 |
+
This model takes in an image and outputs a caption. It was trained using the Coco dataset and the full training script can be found in [this kaggle kernel](https://www.kaggle.com/sachin/visionencoderdecoder-model-training)
|
| 13 |
+
|
| 14 |
+
## Usage
|
| 15 |
+
```python
|
| 16 |
+
import Image
|
| 17 |
+
from transformers import AutoModel, GPT2Tokenizer, ViTFeatureExtractor
|
| 18 |
+
model = AutoModel.from_pretrained("sachin/vit2distilgpt2")
|
| 19 |
+
vit_feature_extractor = ViTFeatureExtractor.from_pretrained("google/vit-base-patch16-224-in21k")
|
| 20 |
+
# make sure GPT2 appends EOS in begin and end
|
| 21 |
+
def build_inputs_with_special_tokens(self, token_ids_0, token_ids_1=None):
|
| 22 |
+
outputs = [self.bos_token_id] + token_ids_0 + [self.eos_token_id]
|
| 23 |
+
return outputs
|
| 24 |
+
|
| 25 |
+
GPT2Tokenizer.build_inputs_with_special_tokens = build_inputs_with_special_tokens
|
| 26 |
+
gpt2_tokenizer = GPT2Tokenizer.from_pretrained("distilgpt2")
|
| 27 |
+
# set pad_token_id to unk_token_id -> be careful here as unk_token_id == eos_token_id == bos_token_id
|
| 28 |
+
gpt2_tokenizer.pad_token = gpt2_tokenizer.unk_token
|
| 29 |
+
image = (Image.open(image_path).convert("RGB"), return_tensors="pt").pixel_values
|
| 30 |
+
encoder_outputs = model.generate(image.unsqueeze(0))
|
| 31 |
+
generated_sentences = gpt2_tokenizer.batch_decode(encoder_outputs, skip_special_tokens=True)
|
| 32 |
+
```
|
| 33 |
+
Note that the output sentence may be repeated, hence a post processing step may be required.
|
| 34 |
+
|
| 35 |
+
## Bias Warning
|
| 36 |
+
This model may be biased due to dataset, lack of long training and the model itself. The following gender bias is an example.
|
| 37 |
+

|