Spaces:
Sleeping
Sleeping
File size: 3,961 Bytes
194cf55 f8c5d54 194cf55 f8c5d54 194cf55 641fdfb f8c5d54 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 |
---
title: Deepseek RAG Chat Bot
emoji: π
colorFrom: red
colorTo: pink
sdk: streamlit
sdk_version: 1.41.1
app_file: app.py
pinned: false
license: apache-2.0
short_description: Deepseek-RAG-Chat-Bot
---
# RAG-Powered Chatbot with Streamlit
This project is a Retrieval-Augmented Generation (RAG) chatbot built using Streamlit. It allows users to upload a PDF document, process it, and ask questions about its content. The application efficiently processes the document once and uses vector-based retrieval to answer queries.
---
## Features
- Upload PDF documents and process them into chunks for efficient querying.
- Generate semantic embeddings using `sentence-transformers`.
- Store embeddings in a `FAISS` vector database for efficient retrieval.
- Use the `DeepSeek` API for question-answering capabilities.
- Built with Streamlit for an interactive and user-friendly UI.
---
## Requirements
- Python 3.8 or higher
### Dependencies
Install the required Python libraries:
```plaintext
streamlit==1.25.0
langchain==0.81.0
langchain-community==0.1.2
faiss-cpu==1.7.4
sentence-transformers==2.2.2
pypdf==3.8.1
```
To install all dependencies:
```bash
pip install -r requirements.txt
```
---
## Setup and Usage
### 1. Clone the Repository
```bash
git clone https://github.com/your-username/rag-chatbot.git
cd rag-chatbot
```
### 2. Install Dependencies
```bash
pip install -r requirements.txt
```
### 3. Run the Application
Run the Streamlit application:
```bash
streamlit run app.py
```
### 4. Interact with the Chatbot
1. Enter your `DeepSeek API Key` in the provided input field.
2. Upload a PDF document.
3. Ask questions about the content of the document.
---
## Project Structure
```plaintext
.
βββ app.py # Main application code
βββ requirements.txt # List of dependencies
βββ README.md # Documentation
```
---
## Key Technologies Used
1. **Streamlit**:
- For building a user-friendly web interface.
2. **LangChain**:
- For document loading, text splitting, and RAG pipeline.
3. **FAISS**:
- For storing and querying vector embeddings.
4. **Sentence Transformers**:
- For generating semantic embeddings of text chunks.
5. **PyPDF**:
- For parsing PDF files.
6. **DeepSeek API**:
- For question-answering capabilities.
---
## How It Works
1. **PDF Upload**:
- The user uploads a PDF document.
- The document is split into manageable text chunks.
2. **Embeddings Generation**:
- Semantic embeddings are generated using `sentence-transformers`.
3. **Vector Storage**:
- The embeddings are stored in a `FAISS` vector database for efficient retrieval.
4. **Question Answering**:
- The user asks a question about the uploaded document.
- The RAG pipeline retrieves relevant chunks and generates a response using the `DeepSeek` API.
---
## Troubleshooting
- **Error: `pypdf package not found`**
Ensure `pypdf` is installed. Run:
```bash
pip install pypdf
```
- **Error: `langchain-community module not found`**
Ensure `langchain-community` is installed. Run:
```bash
pip install langchain-community
```
- **Reprocessing PDF on Every Query**
This issue is resolved by using `st.session_state` to persist the processed `vector_store`.
---
## Future Improvements
1. Add support for multiple file uploads.
2. Integrate additional language models.
3. Enhance the UI with better visualization of document content.
4. Add support for other document formats (e.g., Word, TXT).
---
## License
This project is licensed under the MIT License. See the `LICENSE` file for more details.
---
## Contributions
Contributions are welcome! Feel free to fork the repository and submit a pull request.
---
## Contact
For any queries or support, please contact:
- Name: [Sagun Chalise]
- Email: [[email protected]]
---
Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference |