RAGExplo / README.md
bsoupy's picture
Upload 5 files
1fc786c verified
|
raw
history blame
486 Bytes

๐Ÿ›๏ธ RAG Image Captioning with Landmark Location

This model generates captions for monument/landmark images using a retrieval-augmented generation approach.

How it works:

  • Uses CLIP to extract image embeddings.
  • Retrieves top-k similar captions via FAISS.
  • Generates a detailed caption with name and location using T5.

Example

Input: ๐Ÿฐ Image of the Taj Mahal
Output: "The place might be: Agra. The Taj Mahal is a white marble mausoleum located in Agra, India."