WebashalarForML commited on
Commit
533a593
Β·
verified Β·
1 Parent(s): 05a3944

Update README2.md

Browse files
Files changed (1) hide show
  1. README2.md +130 -0
README2.md CHANGED
@@ -0,0 +1,130 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Document-Based RAG AI
2
+
3
+ This project implements a Retrieval-Augmented Generation (RAG) architecture to extract and retrieve information from uploaded documents and answer user queries using a chat interface. The application uses a Flask-based web interface and a Chroma vector database for document indexing and retrieval.
4
+
5
+ ---
6
+
7
+ ## Problem Statement
8
+
9
+ Organizations often struggle to manage and query unstructured textual data spread across various documents. This application provides an efficient solution by creating a searchable vector database of document contents, enabling precise query-based retrieval and response generation.
10
+
11
+ ---
12
+
13
+ ## Setup
14
+
15
+ ### 1. Virtual Environment
16
+ Set up a Python virtual environment to manage dependencies:
17
+ ```bash
18
+ python -m venv venv
19
+ source venv/bin/activate # On Windows: venv\Scripts\activate
20
+ ```
21
+
22
+ ### 2. Install Dependencies
23
+ Install required packages from the `requirements.txt` file:
24
+ ```bash
25
+ pip install -r requirements.txt
26
+ ```
27
+
28
+ ### 3. Running the Application
29
+ Start the Flask application:
30
+ ```bash
31
+ python app.py
32
+ ```
33
+
34
+ ---
35
+
36
+ ## Components Overview
37
+
38
+ ### 1. [LangChain](https://github.com/hwchase17/langchain)
39
+ LangChain enables seamless integration of LLMs with retrieval systems like vector databases.
40
+
41
+ ### 2. [Flask](https://flask.palletsprojects.com/)
42
+ Flask provides the web framework to build the user interface and RESTful APIs.
43
+
44
+ ### 3. [Chroma Vector Database](https://docs.trychroma.com/)
45
+ Chroma is used to store and retrieve document embeddings for similarity-based querying.
46
+
47
+ ### 4. RAG Architecture
48
+ Combines retrieval of relevant document chunks with LLMs to generate precise responses based on context.
49
+
50
+ ### 5. Models Used
51
+ - **Embedding Model:** `all-MiniLM-L6-v2` (via HuggingFace)
52
+ - **Chat Model:** `Mistral-7B-Instruct-v0.3`
53
+
54
+ ---
55
+
56
+ ## Application Workflow
57
+
58
+ ### Overview
59
+ A typical RAG application has two components:
60
+ 1. **Indexing**: Processes and indexes documents for searchability.
61
+ 2. **Retrieval and Generation**: Retrieves relevant document chunks and generates a context-based response.
62
+
63
+ #### Indexing
64
+ 1. **Load**: Upload documents using the web interface.
65
+ 2. **Split**: Break documents into smaller chunks for indexing.
66
+ 3. **Store**: Save embeddings in a Chroma vector database.
67
+
68
+ #### Retrieval and Generation
69
+ 1. **Retrieve**: Search for relevant chunks based on user queries.
70
+ 2. **Generate**: Produce context-aware answers using the Chat Model.
71
+
72
+ ---
73
+
74
+ ## Application Features
75
+
76
+ 1. **Create Database**
77
+ - Upload documents and generate a searchable vector database.
78
+
79
+ 2. **Update Database**
80
+ - Update the vector database by adding new document.
81
+
82
+ 3. **Remove Database**
83
+ - Remove the vector database.
84
+
85
+ 4. **Delete Documents in Database**
86
+ - Delete any specific document in the vector database.
87
+
88
+ 5. **List Databases**
89
+ - View all available vector databases.
90
+
91
+ 6. **Chat Interface**
92
+ - Select a vector database and interact via queries.
93
+
94
+ ---
95
+
96
+ ## App Tree
97
+
98
+ ```
99
+ .
100
+ β”œβ”€β”€ app.py # Flask application
101
+ β”œβ”€β”€ retrival.py # Data retrieval and vector database management
102
+ β”œβ”€β”€ templates/
103
+ β”‚ β”œβ”€β”€ home.html # Home page template
104
+ β”‚ β”œβ”€β”€ chat.html # Chat interface template
105
+ β”‚ β”œβ”€β”€ create_db.html # Upload documents for database creation
106
+ β”‚ β”œβ”€β”€ list_dbs.html # List available vector databases
107
+ β”œβ”€β”€ uploads/ # Uploaded document storage
108
+ β”œβ”€β”€ VectorDB/ # Vector database storage
109
+ β”œβ”€β”€ TableDB/ # Table database storage
110
+ β”œβ”€β”€ ImageDB/ # Image database storage
111
+ β”œβ”€β”€ requirements.txt # Python dependencies
112
+ β”œβ”€β”€ .env # Environment variables (e.g., HuggingFace API key)
113
+ └── README.md # Documentation
114
+ ```
115
+
116
+ ---
117
+
118
+ ## Example Use Using the flask
119
+
120
+ 1. **Navigate to `/create-db`** to upload documents and generate a vector database (via Flask)
121
+ 2. **Navigate to `/list-db`** to view all available databases.
122
+ 3. **Select a database** using `/select-db/<db_name>` (Flask).
123
+ 4. **Query a database** using `/chat` (Flask) to retrieve relevant information and generate a response.
124
+ 5. **Update a database** using `/update-dbs/<db_name>` (Flask) to update db by adding the files or even whole folder containing the files.
125
+ 6. **Remove a database** using `/remove-dbs/<db_name>` (Flask) to remove or delete entire database.
126
+ 7. **Delete document in database** using `/delete-doc/<db_name>` (Flask) to delete a specific document in the database.
127
+
128
+ ---
129
+
130
+ Happy experimenting! πŸš€