--- language: - en tags: - image-to-text --- ## lokibots/vit-patch16-1280-gpt2-large-image-summary This model generates a summary from a given chart image. The model accepts an image of size 1280x768 (or less) and generates a summary describing the contents of the image. **However, training is still required.** ## sample inference code ```{python} from transformers import VisionEncoderDecoderModel, ViTFeatureExtractor, GPT2Tokenizer from PIL import Image model = VisionEncoderDecoderModel.from_pretrained("lokibots/vit-patch16-1280-gpt2-large-image-summary") feature_extractor = ViTFeatureExtractor.from_pretrained("lokibots/vit-patch16-1280-gpt2-large-image-summary") tokenizer = GPT2Tokenizer.from_pretrained('gpt2-large') image = Image.open("image_file").convert("RGB") pixel_values = feature_extractor(images=image, return_tensors="pt").pixel_values gen_kwargs = {"max_length": 1024, "num_beams": 4} output_ids = model.generate(pixel_values, **gen_kwargs) preds = tokenizer.batch_decode(output_ids, skip_special_tokens=True) ```