File size: 3,961 Bytes
194cf55
f8c5d54
194cf55
 
 
 
 
 
 
 
f8c5d54
194cf55
 
641fdfb
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
f8c5d54
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
---
title: Deepseek RAG Chat Bot
emoji: πŸ“ˆ
colorFrom: red
colorTo: pink
sdk: streamlit
sdk_version: 1.41.1
app_file: app.py
pinned: false
license: apache-2.0
short_description: Deepseek-RAG-Chat-Bot
---

# RAG-Powered Chatbot with Streamlit

This project is a Retrieval-Augmented Generation (RAG) chatbot built using Streamlit. It allows users to upload a PDF document, process it, and ask questions about its content. The application efficiently processes the document once and uses vector-based retrieval to answer queries.

---

## Features

- Upload PDF documents and process them into chunks for efficient querying.
- Generate semantic embeddings using `sentence-transformers`.
- Store embeddings in a `FAISS` vector database for efficient retrieval.
- Use the `DeepSeek` API for question-answering capabilities.
- Built with Streamlit for an interactive and user-friendly UI.

---

## Requirements

- Python 3.8 or higher

### Dependencies

Install the required Python libraries:

```plaintext
streamlit==1.25.0
langchain==0.81.0
langchain-community==0.1.2
faiss-cpu==1.7.4
sentence-transformers==2.2.2
pypdf==3.8.1
```

To install all dependencies:

```bash
pip install -r requirements.txt
```

---

## Setup and Usage

### 1. Clone the Repository

```bash
git clone https://github.com/your-username/rag-chatbot.git
cd rag-chatbot
```

### 2. Install Dependencies

```bash
pip install -r requirements.txt
```

### 3. Run the Application

Run the Streamlit application:

```bash
streamlit run app.py
```

### 4. Interact with the Chatbot

1. Enter your `DeepSeek API Key` in the provided input field.
2. Upload a PDF document.
3. Ask questions about the content of the document.

---

## Project Structure

```plaintext
.
β”œβ”€β”€ app.py              # Main application code
β”œβ”€β”€ requirements.txt    # List of dependencies
β”œβ”€β”€ README.md           # Documentation
```

---

## Key Technologies Used

1. **Streamlit**:
   - For building a user-friendly web interface.

2. **LangChain**:
   - For document loading, text splitting, and RAG pipeline.

3. **FAISS**:
   - For storing and querying vector embeddings.

4. **Sentence Transformers**:
   - For generating semantic embeddings of text chunks.

5. **PyPDF**:
   - For parsing PDF files.

6. **DeepSeek API**:
   - For question-answering capabilities.

---

## How It Works

1. **PDF Upload**:
   - The user uploads a PDF document.
   - The document is split into manageable text chunks.

2. **Embeddings Generation**:
   - Semantic embeddings are generated using `sentence-transformers`.

3. **Vector Storage**:
   - The embeddings are stored in a `FAISS` vector database for efficient retrieval.

4. **Question Answering**:
   - The user asks a question about the uploaded document.
   - The RAG pipeline retrieves relevant chunks and generates a response using the `DeepSeek` API.

---

## Troubleshooting

- **Error: `pypdf package not found`**
  Ensure `pypdf` is installed. Run:
  ```bash
  pip install pypdf
  ```

- **Error: `langchain-community module not found`**
  Ensure `langchain-community` is installed. Run:
  ```bash
  pip install langchain-community
  ```

- **Reprocessing PDF on Every Query**
  This issue is resolved by using `st.session_state` to persist the processed `vector_store`.

---

## Future Improvements

1. Add support for multiple file uploads.
2. Integrate additional language models.
3. Enhance the UI with better visualization of document content.
4. Add support for other document formats (e.g., Word, TXT).

---

## License

This project is licensed under the MIT License. See the `LICENSE` file for more details.

---

## Contributions

Contributions are welcome! Feel free to fork the repository and submit a pull request.

---

## Contact

For any queries or support, please contact:

- Name: [Sagun Chalise]
- Email: [[email protected]]


---


Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference