BLIPNet Model
This is the structure of the BLIPNet model. You can load the model with this structure, or you can create a bigger model for your specific task.
Model Structure
import torch
import torch.nn as nn
from transformers import BlipForConditionalGeneration
class BLIPNet(torch.nn.Module):
def __init__(self):
super().__init__()
# Generation Model
self.model = BlipForConditionalGeneration.from_pretrained("Salesforceblip-image-captioning-base", cache_dir="model")
# Same with https://huggingface.co/uf-aice-lab/BLIP-Math
self.ebd_dim = 443136
# Classification Model
fc_dim = 64 # You can choose a higher number for better performance, for example, 1024.
self.head = nn.Sequential(
nn.Linear(self.ebd_dim, fc_dim),
nn.ReLU(),
)
self.output1= nn.Linear(fc_dim, 5) # 5 classes
def forward(self, pixel_values, input_ids):
outputs = self.model(input_ids=input_ids, pixel_values=pixel_values, labels=input_ids)
image_text_embeds = self.model.vision_model(pixel_values, return_dict=True).last_hidden_state
image_text_embeds = self.head(image_text_embeds.view(-1, self.ebd_dim))
# A classification model is based on embeddings from a generative model to leverage BLIP's powerful image-text encoding capabilities.
logits = self.output1(image_text_embeds)
# generated text, probabilities of classification
return outputs, logits
model = BLIPNet()
model.load_state_dict(torch.load("BLILP_Generation_Classification.bin"), strict=False)
You need to input the sample in the same way as shown in the example provided at: https://huggingface.co/uf-aice-lab/BLIP-Math
Then you can get the generated text and classification score simultaneously.
Inference Providers
NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API:
The model has no library tag.