BLIPNet Model

This is the structure of the BLIPNet model. You can load the model with this structure, or you can create a bigger model for your specific task.

Model Structure

import torch
import torch.nn as nn
from transformers import BlipForConditionalGeneration

class BLIPNet(torch.nn.Module):
    def __init__(self):
        super().__init__()
        # Generation Model
        self.model = BlipForConditionalGeneration.from_pretrained("Salesforceblip-image-captioning-base", cache_dir="model")
        # Same with https://huggingface.co/uf-aice-lab/BLIP-Math
        self.ebd_dim = 443136
        
        # Classification Model
        fc_dim = 64  # You can choose a higher number for better performance, for example, 1024.
        self.head = nn.Sequential(
            nn.Linear(self.ebd_dim, fc_dim),
            nn.ReLU(), 
        )
        self.output1= nn.Linear(fc_dim, 5)  # 5 classes
        
    def forward(self, pixel_values, input_ids):
        outputs = self.model(input_ids=input_ids, pixel_values=pixel_values, labels=input_ids)
        image_text_embeds = self.model.vision_model(pixel_values, return_dict=True).last_hidden_state
        image_text_embeds = self.head(image_text_embeds.view(-1, self.ebd_dim))

        # A classification model is based on embeddings from a generative model to leverage BLIP's powerful image-text encoding capabilities.
        logits = self.output1(image_text_embeds)

        # generated text, probabilities of classification
        return outputs, logits  
        
model = BLIPNet()
model.load_state_dict(torch.load("BLILP_Generation_Classification.bin"), strict=False)

You need to input the sample in the same way as shown in the example provided at: https://huggingface.co/uf-aice-lab/BLIP-Math
Then you can get the generated text and classification score simultaneously.
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no library tag.