[ICML 2025] Exploring CLIP Geometry: Likelihood & Conformity (Demo Inside)

#74
by Yossilevii100 - opened

Hi everyone ๐Ÿ‘‹
Weโ€™re excited to share two ICML 2025 papers exploring the geometry of CLIPโ€™s latent space. To make the ideas tangible, we built a Hugging Face demo that lets you explore them interactively ๐Ÿš€.

๐Ÿ“„ The Double Ellipsoid Geometry of CLIP (https://huggingface.co/papers/2411.14517)
Shows that CLIPโ€™s image and text embeddings do not live on a single hypersphere, but rather in a distinct double ellipsoid structure.
We introduce conformity measure, reveals how common the image or caption is (mathematically proven!).

๐Ÿ“„ Whitened CLIP as a Likelihood Surrogate of Images and Captions (https://huggingface.co/papers/2505.06934)
By a simple linear transformation (Whitening) on CLIP latent space, one can measure the likelihood of samples simply by their norms (mathematically driven).

๐Ÿ™Œ Try it out
https://huggingface.co/spaces/Yossilevii100/CLIPLatent
Upload an image or text โ†’ see its conformity and likelihood scores.
You can try to compare the likelihood of your image, and your friend. See who CLIP consider as more likely.

๐ŸŒŸ Why it matters
Provides a geometric perspective on CLIP embeddings.
Bridges theory (double ellipsoids) with practice - reveal what samples are likely or not, which may be the source for image synthesis fail cases.
Opens the door to using CLIP embeddings as likelihood estimators and for robustness analysis.

Weโ€™d love your feedback, ideas, and extensions!

Sign up or log in to comment