import streamlit as st # Custom CSS for better styling st.markdown(""" """, unsafe_allow_html=True) # Main Title st.markdown('

Image Zero Shot Classification with CLIP

', unsafe_allow_html=True) # Description st.markdown("""

CLIP (Contrastive Language-Image Pre-Training) is a neural network trained on image and text pairs. It has the capability to classify images without requiring hard-coded labels, making it highly flexible. Labels can be provided during inference, similar to the zero-shot capabilities of GPT-2 and GPT-3 models.

This model was imported from Hugging Face Transformers: CLIP Model on Hugging Face

""", unsafe_allow_html=True) # How to Use st.markdown('

How to Use the Model

', unsafe_allow_html=True) st.code(''' import sparknlp from sparknlp.base import * from sparknlp.annotator import * from pyspark.ml import Pipeline # Load image data imageDF = spark.read \\ .format("image") \\ .option("dropInvalid", value = True) \\ .load("src/test/resources/image/") # Define Image Assembler imageAssembler: ImageAssembler = ImageAssembler() \\ .setInputCol("image") \\ .setOutputCol("image_assembler") # Define candidate labels candidateLabels = [ "a photo of a bird", "a photo of a cat", "a photo of a dog", "a photo of a hen", "a photo of a hippo", "a photo of a room", "a photo of a tractor", "a photo of an ostrich", "a photo of an ox"] # Define CLIP classifier imageClassifier = CLIPForZeroShotClassification \\ .pretrained() \\ .setInputCols(["image_assembler"]) \\ .setOutputCol("label") \\ .setCandidateLabels(candidateLabels) # Create pipeline pipeline = Pipeline().setStages([imageAssembler, imageClassifier]) # Apply pipeline to image data pipelineDF = pipeline.fit(imageDF).transform(imageDF) # Show results pipelineDF \\ .selectExpr("reverse(split(image.origin, '/'))[0] as image_name", "label.result") \\ .show(truncate=False) ''', language='python') # Results st.markdown('

Results