import streamlit as st # Custom CSS for better styling st.markdown(""" """, unsafe_allow_html=True) # Main Title st.markdown('
Image Zero Shot Classification with CLIP
', unsafe_allow_html=True) # Description st.markdown("""

CLIP (Contrastive Language-Image Pre-Training) is a neural network trained on image and text pairs. It has the capability to classify images without requiring hard-coded labels, making it highly flexible. Labels can be provided during inference, similar to the zero-shot capabilities of GPT-2 and GPT-3 models.

This model was imported from Hugging Face Transformers: CLIP Model on Hugging Face

""", unsafe_allow_html=True) # How to Use st.markdown('
How to Use the Model
', unsafe_allow_html=True) st.code(''' import sparknlp from sparknlp.base import * from sparknlp.annotator import * from pyspark.ml import Pipeline # Load image data imageDF = spark.read \\ .format("image") \\ .option("dropInvalid", value = True) \\ .load("src/test/resources/image/") # Define Image Assembler imageAssembler: ImageAssembler = ImageAssembler() \\ .setInputCol("image") \\ .setOutputCol("image_assembler") # Define candidate labels candidateLabels = [ "a photo of a bird", "a photo of a cat", "a photo of a dog", "a photo of a hen", "a photo of a hippo", "a photo of a room", "a photo of a tractor", "a photo of an ostrich", "a photo of an ox"] # Define CLIP classifier imageClassifier = CLIPForZeroShotClassification \\ .pretrained() \\ .setInputCols(["image_assembler"]) \\ .setOutputCol("label") \\ .setCandidateLabels(candidateLabels) # Create pipeline pipeline = Pipeline().setStages([imageAssembler, imageClassifier]) # Apply pipeline to image data pipelineDF = pipeline.fit(imageDF).transform(imageDF) # Show results pipelineDF \\ .selectExpr("reverse(split(image.origin, '/'))[0] as image_name", "label.result") \\ .show(truncate=False) ''', language='python') # Results st.markdown('
Results
', unsafe_allow_html=True) st.markdown("""
Image Name Result
palace.JPEG [a photo of a room]
egyptian_cat.jpeg [a photo of a cat]
hippopotamus.JPEG [a photo of a hippo]
hen.JPEG [a photo of a hen]
ostrich.JPEG [a photo of an ostrich]
junco.JPEG [a photo of a bird]
bluetick.jpg [a photo of a dog]
chihuahua.jpg [a photo of a dog]
tractor.JPEG [a photo of a tractor]
ox.JPEG [a photo of an ox]
""", unsafe_allow_html=True) # Model Information st.markdown('
Model Information
', unsafe_allow_html=True) st.markdown("""
Attribute Description
Model Name zero_shot_classifier_clip_vit_base_patch32
Compatibility Spark NLP 5.2.0+
License Open Source
Edition Official
Input Labels [image_assembler]
Output Labels [classification]
Language en
Size 392.8 MB
""", unsafe_allow_html=True) # Data Source Section st.markdown('
Data Source
', unsafe_allow_html=True) st.markdown("""

The CLIP model is available on Hugging Face. This model was trained on image-text pairs and can be used for zero-shot image classification.

""", unsafe_allow_html=True) # References st.markdown('
References
', unsafe_allow_html=True) st.markdown("""
""", unsafe_allow_html=True) # Community & Support st.markdown('
Community & Support
', unsafe_allow_html=True) st.markdown("""
""", unsafe_allow_html=True)