Spaces:
Sleeping
Sleeping
Update README.md
Browse files
README.md
CHANGED
@@ -11,4 +11,178 @@ license: apache-2.0
|
|
11 |
short_description: deepseek-chat-bot
|
12 |
---
|
13 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
14 |
Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
|
|
|
11 |
short_description: deepseek-chat-bot
|
12 |
---
|
13 |
|
14 |
+
# RAG-Powered Chatbot with Streamlit
|
15 |
+
|
16 |
+
This project is a Retrieval-Augmented Generation (RAG) chatbot built using Streamlit. It allows users to upload a PDF document, process it, and ask questions about its content. The application efficiently processes the document once and uses vector-based retrieval to answer queries.
|
17 |
+
|
18 |
+
---
|
19 |
+
|
20 |
+
## Features
|
21 |
+
|
22 |
+
- Upload PDF documents and process them into chunks for efficient querying.
|
23 |
+
- Generate semantic embeddings using `sentence-transformers`.
|
24 |
+
- Store embeddings in a `FAISS` vector database for efficient retrieval.
|
25 |
+
- Use the `DeepSeek` API for question-answering capabilities.
|
26 |
+
- Built with Streamlit for an interactive and user-friendly UI.
|
27 |
+
|
28 |
+
---
|
29 |
+
|
30 |
+
## Requirements
|
31 |
+
|
32 |
+
- Python 3.8 or higher
|
33 |
+
|
34 |
+
### Dependencies
|
35 |
+
|
36 |
+
Install the required Python libraries:
|
37 |
+
|
38 |
+
```plaintext
|
39 |
+
streamlit==1.25.0
|
40 |
+
langchain==0.81.0
|
41 |
+
langchain-community==0.1.2
|
42 |
+
faiss-cpu==1.7.4
|
43 |
+
sentence-transformers==2.2.2
|
44 |
+
pypdf==3.8.1
|
45 |
+
```
|
46 |
+
|
47 |
+
To install all dependencies:
|
48 |
+
|
49 |
+
```bash
|
50 |
+
pip install -r requirements.txt
|
51 |
+
```
|
52 |
+
|
53 |
+
---
|
54 |
+
|
55 |
+
## Setup and Usage
|
56 |
+
|
57 |
+
### 1. Clone the Repository
|
58 |
+
|
59 |
+
```bash
|
60 |
+
git clone https://github.com/your-username/rag-chatbot.git
|
61 |
+
cd rag-chatbot
|
62 |
+
```
|
63 |
+
|
64 |
+
### 2. Install Dependencies
|
65 |
+
|
66 |
+
```bash
|
67 |
+
pip install -r requirements.txt
|
68 |
+
```
|
69 |
+
|
70 |
+
### 3. Run the Application
|
71 |
+
|
72 |
+
Run the Streamlit application:
|
73 |
+
|
74 |
+
```bash
|
75 |
+
streamlit run app.py
|
76 |
+
```
|
77 |
+
|
78 |
+
### 4. Interact with the Chatbot
|
79 |
+
|
80 |
+
1. Enter your `DeepSeek API Key` in the provided input field.
|
81 |
+
2. Upload a PDF document.
|
82 |
+
3. Ask questions about the content of the document.
|
83 |
+
|
84 |
+
---
|
85 |
+
|
86 |
+
## Project Structure
|
87 |
+
|
88 |
+
```plaintext
|
89 |
+
.
|
90 |
+
βββ app.py # Main application code
|
91 |
+
βββ requirements.txt # List of dependencies
|
92 |
+
βββ README.md # Documentation
|
93 |
+
```
|
94 |
+
|
95 |
+
---
|
96 |
+
|
97 |
+
## Key Technologies Used
|
98 |
+
|
99 |
+
1. **Streamlit**:
|
100 |
+
- For building a user-friendly web interface.
|
101 |
+
|
102 |
+
2. **LangChain**:
|
103 |
+
- For document loading, text splitting, and RAG pipeline.
|
104 |
+
|
105 |
+
3. **FAISS**:
|
106 |
+
- For storing and querying vector embeddings.
|
107 |
+
|
108 |
+
4. **Sentence Transformers**:
|
109 |
+
- For generating semantic embeddings of text chunks.
|
110 |
+
|
111 |
+
5. **PyPDF**:
|
112 |
+
- For parsing PDF files.
|
113 |
+
|
114 |
+
6. **DeepSeek API**:
|
115 |
+
- For question-answering capabilities.
|
116 |
+
|
117 |
+
---
|
118 |
+
|
119 |
+
## How It Works
|
120 |
+
|
121 |
+
1. **PDF Upload**:
|
122 |
+
- The user uploads a PDF document.
|
123 |
+
- The document is split into manageable text chunks.
|
124 |
+
|
125 |
+
2. **Embeddings Generation**:
|
126 |
+
- Semantic embeddings are generated using `sentence-transformers`.
|
127 |
+
|
128 |
+
3. **Vector Storage**:
|
129 |
+
- The embeddings are stored in a `FAISS` vector database for efficient retrieval.
|
130 |
+
|
131 |
+
4. **Question Answering**:
|
132 |
+
- The user asks a question about the uploaded document.
|
133 |
+
- The RAG pipeline retrieves relevant chunks and generates a response using the `DeepSeek` API.
|
134 |
+
|
135 |
+
---
|
136 |
+
|
137 |
+
## Troubleshooting
|
138 |
+
|
139 |
+
- **Error: `pypdf package not found`**
|
140 |
+
Ensure `pypdf` is installed. Run:
|
141 |
+
```bash
|
142 |
+
pip install pypdf
|
143 |
+
```
|
144 |
+
|
145 |
+
- **Error: `langchain-community module not found`**
|
146 |
+
Ensure `langchain-community` is installed. Run:
|
147 |
+
```bash
|
148 |
+
pip install langchain-community
|
149 |
+
```
|
150 |
+
|
151 |
+
- **Reprocessing PDF on Every Query**
|
152 |
+
This issue is resolved by using `st.session_state` to persist the processed `vector_store`.
|
153 |
+
|
154 |
+
---
|
155 |
+
|
156 |
+
## Future Improvements
|
157 |
+
|
158 |
+
1. Add support for multiple file uploads.
|
159 |
+
2. Integrate additional language models.
|
160 |
+
3. Enhance the UI with better visualization of document content.
|
161 |
+
4. Add support for other document formats (e.g., Word, TXT).
|
162 |
+
|
163 |
+
---
|
164 |
+
|
165 |
+
## License
|
166 |
+
|
167 |
+
This project is licensed under the MIT License. See the `LICENSE` file for more details.
|
168 |
+
|
169 |
+
---
|
170 |
+
|
171 |
+
## Contributions
|
172 |
+
|
173 |
+
Contributions are welcome! Feel free to fork the repository and submit a pull request.
|
174 |
+
|
175 |
+
---
|
176 |
+
|
177 |
+
## Contact
|
178 |
+
|
179 |
+
For any queries or support, please contact:
|
180 |
+
|
181 |
+
- Name: [Sagun Chalise]
|
182 |
+
- Email: [[email protected]]
|
183 |
+
|
184 |
+
|
185 |
+
---
|
186 |
+
|
187 |
+
|
188 |
Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
|