chalisesagun commited on
Commit
641fdfb
Β·
verified Β·
1 Parent(s): dd76442

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +174 -0
README.md CHANGED
@@ -11,4 +11,178 @@ license: apache-2.0
11
  short_description: deepseek-chat-bot
12
  ---
13
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
14
  Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
11
  short_description: deepseek-chat-bot
12
  ---
13
 
14
+ # RAG-Powered Chatbot with Streamlit
15
+
16
+ This project is a Retrieval-Augmented Generation (RAG) chatbot built using Streamlit. It allows users to upload a PDF document, process it, and ask questions about its content. The application efficiently processes the document once and uses vector-based retrieval to answer queries.
17
+
18
+ ---
19
+
20
+ ## Features
21
+
22
+ - Upload PDF documents and process them into chunks for efficient querying.
23
+ - Generate semantic embeddings using `sentence-transformers`.
24
+ - Store embeddings in a `FAISS` vector database for efficient retrieval.
25
+ - Use the `DeepSeek` API for question-answering capabilities.
26
+ - Built with Streamlit for an interactive and user-friendly UI.
27
+
28
+ ---
29
+
30
+ ## Requirements
31
+
32
+ - Python 3.8 or higher
33
+
34
+ ### Dependencies
35
+
36
+ Install the required Python libraries:
37
+
38
+ ```plaintext
39
+ streamlit==1.25.0
40
+ langchain==0.81.0
41
+ langchain-community==0.1.2
42
+ faiss-cpu==1.7.4
43
+ sentence-transformers==2.2.2
44
+ pypdf==3.8.1
45
+ ```
46
+
47
+ To install all dependencies:
48
+
49
+ ```bash
50
+ pip install -r requirements.txt
51
+ ```
52
+
53
+ ---
54
+
55
+ ## Setup and Usage
56
+
57
+ ### 1. Clone the Repository
58
+
59
+ ```bash
60
+ git clone https://github.com/your-username/rag-chatbot.git
61
+ cd rag-chatbot
62
+ ```
63
+
64
+ ### 2. Install Dependencies
65
+
66
+ ```bash
67
+ pip install -r requirements.txt
68
+ ```
69
+
70
+ ### 3. Run the Application
71
+
72
+ Run the Streamlit application:
73
+
74
+ ```bash
75
+ streamlit run app.py
76
+ ```
77
+
78
+ ### 4. Interact with the Chatbot
79
+
80
+ 1. Enter your `DeepSeek API Key` in the provided input field.
81
+ 2. Upload a PDF document.
82
+ 3. Ask questions about the content of the document.
83
+
84
+ ---
85
+
86
+ ## Project Structure
87
+
88
+ ```plaintext
89
+ .
90
+ β”œβ”€β”€ app.py # Main application code
91
+ β”œβ”€β”€ requirements.txt # List of dependencies
92
+ β”œβ”€β”€ README.md # Documentation
93
+ ```
94
+
95
+ ---
96
+
97
+ ## Key Technologies Used
98
+
99
+ 1. **Streamlit**:
100
+ - For building a user-friendly web interface.
101
+
102
+ 2. **LangChain**:
103
+ - For document loading, text splitting, and RAG pipeline.
104
+
105
+ 3. **FAISS**:
106
+ - For storing and querying vector embeddings.
107
+
108
+ 4. **Sentence Transformers**:
109
+ - For generating semantic embeddings of text chunks.
110
+
111
+ 5. **PyPDF**:
112
+ - For parsing PDF files.
113
+
114
+ 6. **DeepSeek API**:
115
+ - For question-answering capabilities.
116
+
117
+ ---
118
+
119
+ ## How It Works
120
+
121
+ 1. **PDF Upload**:
122
+ - The user uploads a PDF document.
123
+ - The document is split into manageable text chunks.
124
+
125
+ 2. **Embeddings Generation**:
126
+ - Semantic embeddings are generated using `sentence-transformers`.
127
+
128
+ 3. **Vector Storage**:
129
+ - The embeddings are stored in a `FAISS` vector database for efficient retrieval.
130
+
131
+ 4. **Question Answering**:
132
+ - The user asks a question about the uploaded document.
133
+ - The RAG pipeline retrieves relevant chunks and generates a response using the `DeepSeek` API.
134
+
135
+ ---
136
+
137
+ ## Troubleshooting
138
+
139
+ - **Error: `pypdf package not found`**
140
+ Ensure `pypdf` is installed. Run:
141
+ ```bash
142
+ pip install pypdf
143
+ ```
144
+
145
+ - **Error: `langchain-community module not found`**
146
+ Ensure `langchain-community` is installed. Run:
147
+ ```bash
148
+ pip install langchain-community
149
+ ```
150
+
151
+ - **Reprocessing PDF on Every Query**
152
+ This issue is resolved by using `st.session_state` to persist the processed `vector_store`.
153
+
154
+ ---
155
+
156
+ ## Future Improvements
157
+
158
+ 1. Add support for multiple file uploads.
159
+ 2. Integrate additional language models.
160
+ 3. Enhance the UI with better visualization of document content.
161
+ 4. Add support for other document formats (e.g., Word, TXT).
162
+
163
+ ---
164
+
165
+ ## License
166
+
167
+ This project is licensed under the MIT License. See the `LICENSE` file for more details.
168
+
169
+ ---
170
+
171
+ ## Contributions
172
+
173
+ Contributions are welcome! Feel free to fork the repository and submit a pull request.
174
+
175
+ ---
176
+
177
+ ## Contact
178
+
179
+ For any queries or support, please contact:
180
+
181
+ - Name: [Sagun Chalise]
182
+ - Email: [[email protected]]
183
+
184
+
185
+ ---
186
+
187
+
188
  Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference