Spaces:

chalisesagun
/

deepseek-chat

Sleeping

File size: 3,961 Bytes

---
title: Deepseek RAG Chat Bot
emoji: 📈
colorFrom: red
colorTo: pink
sdk: streamlit
sdk_version: 1.41.1
app_file: app.py
pinned: false
license: apache-2.0
short_description: Deepseek-RAG-Chat-Bot
---

# RAG-Powered Chatbot with Streamlit

This project is a Retrieval-Augmented Generation (RAG) chatbot built using Streamlit. It allows users to upload a PDF document, process it, and ask questions about its content. The application efficiently processes the document once and uses vector-based retrieval to answer queries.

---

## Features

- Upload PDF documents and process them into chunks for efficient querying.
- Generate semantic embeddings using `sentence-transformers`.
- Store embeddings in a `FAISS` vector database for efficient retrieval.
- Use the `DeepSeek` API for question-answering capabilities.
- Built with Streamlit for an interactive and user-friendly UI.

---

## Requirements

- Python 3.8 or higher

### Dependencies

Install the required Python libraries:

```plaintext
streamlit==1.25.0
langchain==0.81.0
langchain-community==0.1.2
faiss-cpu==1.7.4
sentence-transformers==2.2.2
pypdf==3.8.1
```

To install all dependencies:

```bash
pip install -r requirements.txt
```

---

## Setup and Usage

### 1. Clone the Repository

```bash
git clone https://github.com/your-username/rag-chatbot.git
cd rag-chatbot
```

### 2. Install Dependencies

```bash
pip install -r requirements.txt
```

### 3. Run the Application

Run the Streamlit application:

```bash
streamlit run app.py
```

### 4. Interact with the Chatbot

1. Enter your `DeepSeek API Key` in the provided input field.
2. Upload a PDF document.
3. Ask questions about the content of the document.

---

## Project Structure

```plaintext
.
├── app.py              # Main application code
├── requirements.txt    # List of dependencies
├── README.md           # Documentation
```

---

## Key Technologies Used

1. **Streamlit**:
   - For building a user-friendly web interface.

2. **LangChain**:
   - For document loading, text splitting, and RAG pipeline.

3. **FAISS**:
   - For storing and querying vector embeddings.

4. **Sentence Transformers**:
   - For generating semantic embeddings of text chunks.

5. **PyPDF**:
   - For parsing PDF files.

6. **DeepSeek API**:
   - For question-answering capabilities.

---

## How It Works

1. **PDF Upload**:
   - The user uploads a PDF document.
   - The document is split into manageable text chunks.

2. **Embeddings Generation**:
   - Semantic embeddings are generated using `sentence-transformers`.

3. **Vector Storage**:
   - The embeddings are stored in a `FAISS` vector database for efficient retrieval.

4. **Question Answering**:
   - The user asks a question about the uploaded document.
   - The RAG pipeline retrieves relevant chunks and generates a response using the `DeepSeek` API.

---

## Troubleshooting

- **Error: `pypdf package not found`**
  Ensure `pypdf` is installed. Run:
  ```bash
  pip install pypdf
  ```

- **Error: `langchain-community module not found`**
  Ensure `langchain-community` is installed. Run:
  ```bash
  pip install langchain-community
  ```

- **Reprocessing PDF on Every Query**
  This issue is resolved by using `st.session_state` to persist the processed `vector_store`.

---

## Future Improvements

1. Add support for multiple file uploads.
2. Integrate additional language models.
3. Enhance the UI with better visualization of document content.
4. Add support for other document formats (e.g., Word, TXT).

---

## License

This project is licensed under the MIT License. See the `LICENSE` file for more details.

---

## Contributions

Contributions are welcome! Feel free to fork the repository and submit a pull request.

---

## Contact

For any queries or support, please contact:

- Name: [Sagun Chalise]
- Email: [[email protected]]


---


Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference