import streamlit as st
# Custom CSS for better styling
st.markdown("""
""", unsafe_allow_html=True)
# Main Title
st.markdown('
Image Zero Shot Classification with CLIP
', unsafe_allow_html=True)
# Description
st.markdown("""
CLIP (Contrastive Language-Image Pre-Training) is a neural network trained on image and text pairs. It has the capability to classify images without requiring hard-coded labels, making it highly flexible. Labels can be provided during inference, similar to the zero-shot capabilities of GPT-2 and GPT-3 models.
This model was imported from Hugging Face Transformers: CLIP Model on Hugging Face
""", unsafe_allow_html=True)
# How to Use
st.markdown('How to Use the Model
', unsafe_allow_html=True)
st.code('''
import sparknlp
from sparknlp.base import *
from sparknlp.annotator import *
from pyspark.ml import Pipeline
# Load image data
imageDF = spark.read \\
.format("image") \\
.option("dropInvalid", value = True) \\
.load("src/test/resources/image/")
# Define Image Assembler
imageAssembler: ImageAssembler = ImageAssembler() \\
.setInputCol("image") \\
.setOutputCol("image_assembler")
# Define candidate labels
candidateLabels = [
"a photo of a bird",
"a photo of a cat",
"a photo of a dog",
"a photo of a hen",
"a photo of a hippo",
"a photo of a room",
"a photo of a tractor",
"a photo of an ostrich",
"a photo of an ox"]
# Define CLIP classifier
imageClassifier = CLIPForZeroShotClassification \\
.pretrained() \\
.setInputCols(["image_assembler"]) \\
.setOutputCol("label") \\
.setCandidateLabels(candidateLabels)
# Create pipeline
pipeline = Pipeline().setStages([imageAssembler, imageClassifier])
# Apply pipeline to image data
pipelineDF = pipeline.fit(imageDF).transform(imageDF)
# Show results
pipelineDF \\
.selectExpr("reverse(split(image.origin, '/'))[0] as image_name", "label.result") \\
.show(truncate=False)
''', language='python')
# Results
st.markdown('Results
', unsafe_allow_html=True)
st.markdown("""
Image Name |
Result |
palace.JPEG |
[a photo of a room] |
egyptian_cat.jpeg |
[a photo of a cat] |
hippopotamus.JPEG |
[a photo of a hippo] |
hen.JPEG |
[a photo of a hen] |
ostrich.JPEG |
[a photo of an ostrich] |
junco.JPEG |
[a photo of a bird] |
bluetick.jpg |
[a photo of a dog] |
chihuahua.jpg |
[a photo of a dog] |
tractor.JPEG |
[a photo of a tractor] |
ox.JPEG |
[a photo of an ox] |
""", unsafe_allow_html=True)
# Model Information
st.markdown('Model Information
', unsafe_allow_html=True)
st.markdown("""
Attribute |
Description |
Model Name |
zero_shot_classifier_clip_vit_base_patch32 |
Compatibility |
Spark NLP 5.2.0+ |
License |
Open Source |
Edition |
Official |
Input Labels |
[image_assembler] |
Output Labels |
[classification] |
Language |
en |
Size |
392.8 MB |
""", unsafe_allow_html=True)
# Data Source Section
st.markdown('Data Source
', unsafe_allow_html=True)
st.markdown("""
The CLIP model is available on Hugging Face. This model was trained on image-text pairs and can be used for zero-shot image classification.
""", unsafe_allow_html=True)
# References
st.markdown('References
', unsafe_allow_html=True)
st.markdown("""
""", unsafe_allow_html=True)
# Community & Support
st.markdown('Community & Support
', unsafe_allow_html=True)
st.markdown("""
- Official Website: Documentation and examples
- Slack: Live discussion with the community and team
- GitHub: Bug reports, feature requests, and contributions
- Medium: Spark NLP articles
- YouTube: Video tutorials
""", unsafe_allow_html=True)