Spaces:
Runtime error
Runtime error
| import streamlit as st | |
| from utils.frontend import build_sidebar | |
| build_sidebar() | |
| st.markdown(""" | |
| # Better Image Retrieval With Retrieval-Augmented CLIP 🧠 | |
| [CLIP](https://openai.com/blog/clip/) is a neural network that can predict how semantically close images and text pairs are. | |
| In simpler terms, it can tell that the string "Cat" is closer to images of cats rather than images of dogs. | |
| What makes CLIP so powerful is that is a zero-shot model: that means that it can generalize concepts, | |
| understand text and images it has never seen before. For example, it can tell that the string "an animal with yellow eyes" | |
| is closer to images of cats rather than dogs, even though such pair was not in its training data. | |
| Why does this matter? Because zero shot capabilities allow models to understand descriptions. And in fact | |
| CLIP understands that "an animal with pink feathers" matches a flamingo better than a pig. | |
| However, these descriptions need to be related to what the image shows. CLIP knows nothing about the animal features, | |
| history and cultural references: It doesn't know which animals live longer than others, that jaguars were often depicted | |
| in Aztec wall paintings, or that wolves and bears are typical animals that show up in European fairy tales. It doesn't even | |
| know that cheetas are fast, because it cannot tell it from the image. | |
| However, Wikipedia contains all this information, and more. Can we make CLIP "look up" the answer to | |
| our questions on Wikipedia before looking for matches? | |
| In this demo application, we see how can we combine traditional Extractive QA on Wikipedia and CLIP with Haystack.""") | |
| st.image("diagram.png") | |
| st.markdown(""" | |
| In the image above you can see how the process looks like. | |
| First, we download a slice of Wikipedia with information about all the animals in the Lisbon zoo and preprocess, | |
| index, embed and store them in a DocumentStore. For this demo we're using | |
| [FAISSDocumentStore](https://docs.haystack.deepset.ai/docs/document_store). | |
| At this point they are ready to be queried by the text Retriever, in this case an instance of | |
| [EmbeddingRetriever](https://docs.haystack.deepset.ai/docs/retriever#embedding-retrieval-recommended). | |
| It compares the user's question ("The fastest animal") to all the documents indexed earlier and returns the | |
| documents which are more likely to contain an answer to the question. | |
| In this case, it will probably return snippets from the Cheetah Wikipedia entry. | |
| Once the documents are found, they are handed over to the Reader (in this demo, a | |
| [FARMReader](https://docs.haystack.deepset.ai/docs/reader) node): | |
| a model that is able to locate precisely the answer to a question into a document. | |
| These answers are strings that should be now very easy for CLIP to understand, such as the name of an animal. | |
| In this case, the Reader will return answers such as "Cheetah", "the cheetah", etc. | |
| These strings are then ranked and the most likely one is sent over to the | |
| [MultiModalRetriever](https://docs.haystack.deepset.ai/docs/retriever#multimodal-retrieval) | |
| that contains CLIP, which will use its own document store of images to find all the pictures that match the string. | |
| Cheetah are present in the Lisbon zoo, so it will find pictures of them and return them. | |
| These nodes are chained together using a [Pipeline](https://docs.haystack.deepset.ai/docs/pipelines) object, | |
| so that all you need to do to run a system like this is a single call: `pipeline.run(query="What's the fastest animal?")` | |
| will return the list of images directly. | |
| Have a look at [how we implemented it](https://github.com/TuanaCelik/find-the-animal/blob/main/utils/haystack.py)! | |
| """) |