# Document-Based RAG AI This project implements a Retrieval-Augmented Generation (RAG) architecture to extract and retrieve information from uploaded documents and answer user queries using a chat interface. The application uses a Flask-based web interface and a Chroma vector database for document indexing and retrieval. --- ## Problem Statement Organizations often struggle to manage and query unstructured textual data spread across various documents. This application provides an efficient solution by creating a searchable vector database of document contents, enabling precise query-based retrieval and response generation. --- ## Setup ### 1. Virtual Environment Set up a Python virtual environment to manage dependencies: ```bash python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate ``` ### 2. Install Dependencies Install required packages from the `requirements.txt` file: ```bash pip install -r requirements.txt ``` ### 3. Running the Application Start the Flask application: ```bash python app.py ``` --- ## Components Overview ### 1. [LangChain](https://github.com/hwchase17/langchain) LangChain enables seamless integration of LLMs with retrieval systems like vector databases. ### 2. [Flask](https://flask.palletsprojects.com/) Flask provides the web framework to build the user interface and RESTful APIs. ### 3. [Chroma Vector Database](https://docs.trychroma.com/) Chroma is used to store and retrieve document embeddings for similarity-based querying. ### 4. RAG Architecture Combines retrieval of relevant document chunks with LLMs to generate precise responses based on context. ### 5. Models Used - **Embedding Model:** `all-MiniLM-L6-v2` (via HuggingFace) - **Chat Model:** `Mistral-7B-Instruct-v0.3` --- ## Application Workflow ### Overview A typical RAG application has two components: 1. **Indexing**: Processes and indexes documents for searchability. 2. **Retrieval and Generation**: Retrieves relevant document chunks and generates a context-based response. #### Indexing 1. **Load**: Upload documents using the web interface. 2. **Split**: Break documents into smaller chunks for indexing. 3. **Store**: Save embeddings in a Chroma vector database. #### Retrieval and Generation 1. **Retrieve**: Search for relevant chunks based on user queries. 2. **Generate**: Produce context-aware answers using the Chat Model. --- ## Application Features 1. **Create Database** - Upload documents and generate a searchable vector database. 2. **Update Database** - Update the vector database by adding new document. 3. **Remove Database** - Remove the vector database. 4. **Delete Documents in Database** - Delete any specific document in the vector database. 5. **List Databases** - View all available vector databases. 6. **Chat Interface** - Select a vector database and interact via queries. --- ## App Tree ``` . ├── app.py # Flask application ├── retrival.py # Data retrieval and vector database management ├── templates/ │ ├── home.html # Home page template │ ├── chat.html # Chat interface template │ ├── create_db.html # Upload documents for database creation │ ├── list_dbs.html # List available vector databases ├── uploads/ # Uploaded document storage ├── VectorDB/ # Vector database storage ├── TableDB/ # Table database storage ├── ImageDB/ # Image database storage ├── requirements.txt # Python dependencies ├── .env # Environment variables (e.g., HuggingFace API key) └── README.md # Documentation ``` --- ## Example Use Using the flask 1. **Navigate to `/create-db`** to upload documents and generate a vector database (via Flask) 2. **Navigate to `/list-db`** to view all available databases. 3. **Select a database** using `/select-db/` (Flask). 4. **Query a database** using `/chat` (Flask) to retrieve relevant information and generate a response. 5. **Update a database** using `/update-dbs/` (Flask) to update db by adding the files or even whole folder containing the files. 6. **Remove a database** using `/remove-dbs/` (Flask) to remove or delete entire database. 7. **Delete document in database** using `/delete-doc/` (Flask) to delete a specific document in the database. --- Happy experimenting! 🚀