metadata

title: README
emoji: 📉
colorFrom: gray
colorTo: blue
sdk: static
pinned: false

All Things ViTs: Understanding and Interpreting Attention in Vision (CVPR'23 tutorial)

By: Hila Chefer and Sayak Paul

Website: [atv.github.io])https://atv.github.io)

Abstract: In this tutorial, we explore different ways to leverage attention in vision. From left to right: (i) attention can be used to explain the predictions by the model (e.g., CLIP for an image-text pair) (ii) By manipulating the attention-based explainability maps, one can enforce that the prediction is made based on the right reasons (e.g., foreground vs. background) (iii) The cross-attention maps of multi-modal models can be used to guide generative models (e.g., mitigating neglect in Stable Diffusion).

This organization hosts all the interactive demos to be presented at the tutorial. Below, you can find some of them.