Model for "GeoLangBind", is still in development, not the final version

Model card for GeoLB-ViT-B-16-SigLIP

GeoLangBind: Unifying Earth Observation Modalities with Vision-Language Foundation Models

Details are coming soon.

Model Details

  • Model Type: Contrastive Image-Text, Zero-Shot Image Classification.
  • Dataset: Earth observation image-text datasets
  • Papers:
    • GeoLangBind: Unifying Earth Observation Modalities with Vision-Language Foundation Models

Model Usage


Install the geolb_open_clip package first: https://github.com/ShadowXZT/geolb_open_clip.git


With OpenCLIP

import torch
import torch.nn.functional as F
from urllib.request import urlopen
from PIL import Image
from open_clip import create_model_from_pretrained, get_tokenizer # works on open-clip-torch>=2.23.0, timm>=0.9.8

model, preprocess = create_model_from_pretrained('hf-hub:XShadow/GeoLB-ViT-B-16-SigLIP')
tokenizer = get_tokenizer('hf-hub:XShadow/GeoLB-ViT-B-16-SigLIP')

image = Image.open(urlopen(
    'https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/beignets-task-guide.png'
))
image = preprocess(image).unsqueeze(0)

labels_list = ["a dog", "a cat", "a donut", "a beignet"]
text = tokenizer(labels_list, context_length=model.context_length)

with torch.no_grad(), torch.cuda.amp.autocast():
    image_features = model.encode_image(image)
    text_features = model.encode_text(text)
    image_features = F.normalize(image_features, dim=-1)
    text_features = F.normalize(text_features, dim=-1)

    text_probs = torch.sigmoid(image_features @ text_features.T * model.logit_scale.exp() + model.logit_bias)

zipped_list = list(zip(labels_list, [round(p.item(), 3) for p in text_probs[0]]))
print("Label probabilities: ", zipped_list)

With timm (for image embeddings)

from urllib.request import urlopen
from PIL import Image
import timm

image = Image.open(urlopen(
    'https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/beignets-task-guide.png'
))

model = timm.create_model(
    'dofa_vit_base_patch16_siglip_224',
    pretrained=True,
    num_classes=0,
)
model = model.eval()

# get model specific transforms (normalization, resize)
data_config = timm.data.resolve_model_data_config(model)
transforms = timm.data.create_transform(**data_config, is_training=False)

output = model(transforms(image).unsqueeze(0))  # output is (batch_size, num_features) shaped tensor

Citation

Downloads last month
2,302
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no pipeline_tag.