The results on running the model locally and on the Hugging Face web test differ

on the web test the result is good, but on a local run the model does not distinguish between objects and colors (90% of the answers are not correct, although on the web test almost 100% are correct)

here is my code (from the documentation):

fashion_model_path = 'model path'
fashion_model = CLIPModel.from_pretrained(fashion_model_path)
fashion_processor = CLIPProcessor.from_pretrained(fashion_model_path)

class_name = ['text1', 'text2', 'text3']

async def classify_image(image_path, class_names):
async with aiofiles.open(image_path, 'rb') as f:
image_data = await f.read()
image = Image.open(io.BytesIO(image_data)).convert('RGB')
def run_model():
inputs = fashion_processor(text=class_names,
images=image,
return_tensors='pt',
padding=True)
with torch.no_grad():
outputs = fashion_model(**inputs)
logits_per_image = outputs.logits_per_image
probs = logits_per_image.softmax(dim=1)
top_prob, top_idx = probs[0].max(0)
return class_names[top_idx], top_prob.item()
top_class, top_prob = await asyncio.to_thread(run_model)
return top_class, top_prob

patrickjohncyh
/

fashion-clip

difference output

class_name = ['text1', 'text2', 'text3']