Spaces:
Sleeping
A newer version of the Streamlit SDK is available:
1.43.2
title: Deepseek RAG Chat Bot
emoji: π
colorFrom: red
colorTo: pink
sdk: streamlit
sdk_version: 1.41.1
app_file: app.py
pinned: false
license: apache-2.0
short_description: Deepseek-RAG-Chat-Bot
RAG-Powered Chatbot with Streamlit
This project is a Retrieval-Augmented Generation (RAG) chatbot built using Streamlit. It allows users to upload a PDF document, process it, and ask questions about its content. The application efficiently processes the document once and uses vector-based retrieval to answer queries.
Features
- Upload PDF documents and process them into chunks for efficient querying.
- Generate semantic embeddings using
sentence-transformers
. - Store embeddings in a
FAISS
vector database for efficient retrieval. - Use the
DeepSeek
API for question-answering capabilities. - Built with Streamlit for an interactive and user-friendly UI.
Requirements
- Python 3.8 or higher
Dependencies
Install the required Python libraries:
streamlit==1.25.0
langchain==0.81.0
langchain-community==0.1.2
faiss-cpu==1.7.4
sentence-transformers==2.2.2
pypdf==3.8.1
To install all dependencies:
pip install -r requirements.txt
Setup and Usage
1. Clone the Repository
git clone https://github.com/your-username/rag-chatbot.git
cd rag-chatbot
2. Install Dependencies
pip install -r requirements.txt
3. Run the Application
Run the Streamlit application:
streamlit run app.py
4. Interact with the Chatbot
- Enter your
DeepSeek API Key
in the provided input field. - Upload a PDF document.
- Ask questions about the content of the document.
Project Structure
.
βββ app.py # Main application code
βββ requirements.txt # List of dependencies
βββ README.md # Documentation
Key Technologies Used
Streamlit:
- For building a user-friendly web interface.
LangChain:
- For document loading, text splitting, and RAG pipeline.
FAISS:
- For storing and querying vector embeddings.
Sentence Transformers:
- For generating semantic embeddings of text chunks.
PyPDF:
- For parsing PDF files.
DeepSeek API:
- For question-answering capabilities.
How It Works
PDF Upload:
- The user uploads a PDF document.
- The document is split into manageable text chunks.
Embeddings Generation:
- Semantic embeddings are generated using
sentence-transformers
.
- Semantic embeddings are generated using
Vector Storage:
- The embeddings are stored in a
FAISS
vector database for efficient retrieval.
- The embeddings are stored in a
Question Answering:
- The user asks a question about the uploaded document.
- The RAG pipeline retrieves relevant chunks and generates a response using the
DeepSeek
API.
Troubleshooting
Error:
pypdf package not found
Ensurepypdf
is installed. Run:pip install pypdf
Error:
langchain-community module not found
Ensurelangchain-community
is installed. Run:pip install langchain-community
Reprocessing PDF on Every Query This issue is resolved by using
st.session_state
to persist the processedvector_store
.
Future Improvements
- Add support for multiple file uploads.
- Integrate additional language models.
- Enhance the UI with better visualization of document content.
- Add support for other document formats (e.g., Word, TXT).
License
This project is licensed under the MIT License. See the LICENSE
file for more details.
Contributions
Contributions are welcome! Feel free to fork the repository and submit a pull request.
Contact
For any queries or support, please contact:
- Name: [Sagun Chalise]
- Email: [[email protected]]
Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference