IDEFICS2-OCR

Finetuned of Idefics2-8b with fp16 weight update on nielsr/docvqa_1200_examples_donut dataset for document VQA pairs.

Usage

from transformers import BitsAndBytesConfig, AutoModelForVision2Seq, AutoProcessor
from transformers.image_utils import load_image

processor = AutoProcessor.from_pretrained("smishr-18/Idefics2-OCR", do_image_splitting=False)

bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.float16
)

device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
model = AutoModelForVision2Seq.from_pretrained(
    "smishr-18/Idefics2-OCR",
    quantization_config=bnb_config,
    device_map=device,
    low_cpu_mem_usage=True
    )

image = load_image("https://images.pokemontcg.io/pl1/1_hires.png")

messages = [
    {
        "role": "user",
        "content": [
            {"type": "text", "text": "Explain."},
            {"type": "image"},
            {"type": "text", "text": "What is the reflex energy in the image?"}
        ]
    }
]

text = processor.apply_chat_template(messages, add_generation_prompt=True)
inputs = processor(text=[text.strip()], images=[image4], return_tensors="pt", padding=True)
inputs = {k: v.to(device) for k, v in inputs.items()}

# Generate texts
generated_ids = model.generate(**inputs, max_new_tokens=500)
generated_texts = processor.batch_decode(generated_ids, skip_special_tokens=True)
print(generated_texts)
# The reflex energy in the image is 70.

Limitations

The model was finetuned on limited T4 GPU and could be fintuned with more adapters on devices with torch.cuda.get_device_capability()[0] >= 8 or Ampere GPUs.

  • Developed by: Shubh Mishra, Aug 2024
  • Model Type: VLM
  • Language(s) (NLP): English
  • License: MIT
  • Finetuned from model: HuggingFaceM4/idefics2-8b
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.

Dataset used to train smishr-18/Idefics2-OCR