|
--- |
|
datasets: |
|
- MMInstruction/M3IT |
|
pipeline_tag: image-to-text |
|
--- |
|
|
|
This model is fintuned on instruction dataset using `SalesForce/blip-imagecaptioning-base` model. |
|
## Usage: |
|
``` |
|
from transformers import BlipProcessor, BlipForConditionalGeneration |
|
import torch |
|
from PIL import Image |
|
|
|
processor = BlipProcessor.from_pretrained("Salesforce/blip-image-captioning-base") |
|
if processor.tokenizer.eos_token is None: |
|
processor.tokenizer.eos_token = '<|eos|>' |
|
model = BlipForConditionalGeneration.from_pretrained("prasanna2003/Instruct-blip-v2") |
|
|
|
image = Image.open('file_name.jpg').convert('RGB') |
|
|
|
prompt = """Instruction: Answer the following input according to the image. |
|
Input: Describe this image. |
|
output: """ |
|
|
|
inputs = processor(image, prompt, return_tensors="pt") |
|
|
|
output = model.generate(**inputs, max_length=100) |
|
print(tokenizer.decode(output[0])) |
|
``` |