jatinmehra commited on
Commit
05d6c07
·
1 Parent(s): 8114e38

Enhance README.md with detailed features, architecture, technical stack, installation instructions, and usage guidelines

Browse files
Files changed (1) hide show
  1. README_hf.md +217 -1
README_hf.md CHANGED
@@ -8,4 +8,220 @@ colorTo: green
8
  short_description: Agentic RAG APP
9
  ---
10
 
11
- # PDF Insight Pro
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
8
  short_description: Agentic RAG APP
9
  ---
10
 
11
+ # PDF Insight Pro
12
+
13
+ An advanced PDF document analysis tool that combines RAG (Retrieval Augmented Generation) with agentic search capabilities to provide intelligent answers to queries about PDF documents.
14
+
15
+ ## Table of Contents
16
+
17
+ - [Overview](#overview)
18
+ - [Features](#features)
19
+ - [Architecture](#architecture)
20
+ - [Technical Stack](#technical-stack)
21
+ - [Installation](#installation)
22
+ - [Usage](#usage)
23
+ - [API Endpoints](#api-endpoints)
24
+ - [Deployment](#deployment)
25
+ - [Android App](#android-app)
26
+ - [License](#license)
27
+
28
+ ## Overview
29
+
30
+ PDF Insight Pro is a sophisticated document analysis tool that allows users to upload PDF documents and ask questions about their content. The system uses state-of-the-art RAG techniques, combining document chunking, embedding generation, similarity search, and LLM processing to provide accurate and contextually relevant answers.
31
+
32
+ The application employs an agentic approach that can augment the document's information with web search capabilities when needed, ensuring comprehensive and up-to-date responses.
33
+
34
+ ## Features
35
+
36
+ - **PDF Document Processing**: Upload and process PDF documents with automated text extraction and chunking
37
+ - **Agentic RAG System**: Combines document retrieval with powerful LLM reasoning
38
+ - **Web Search Integration**: Verifies document information with Tavily search API integration
39
+ - **Session Management**: Persistent session handling for chat history and document context
40
+ - **Multiple LLM Support**: Choose from different language models (Llama 4 Scout, Llama 3.1, Llama 3.3)
41
+ - **FastAPI Backend**: High-performance API with async support
42
+ - **Responsive UI**: User-friendly interface adaptable to different screen sizes
43
+ - **Docker Containerization**: Easy deployment with containerized application
44
+ - **Hugging Face Integration**: Automatic deployment to Hugging Face Spaces
45
+ - **Android Application**: Native mobile client
46
+
47
+ ## Architecture
48
+
49
+ The application follows a modular architecture with these main components:
50
+
51
+ ### Backend Components
52
+
53
+ 1. **PDF Processing Module** (`preprocessing.py`):
54
+ - Document loading and text extraction using PyMuPDF
55
+ - Intelligent chunking with metadata preservation
56
+ - Embedding generation with sentence transformers
57
+ - FAISS vector index for similarity search
58
+
59
+ 2. **RAG Engine**:
60
+ - Context retrieval based on semantic similarity
61
+ - LLM integration using Groq API
62
+ - Agentic processing with tool-calling capabilities
63
+ - Web search augmentation with Tavily API
64
+
65
+ 3. **API Layer** (`app.py`):
66
+ - FastAPI framework for REST endpoints
67
+ - Session management and persistence
68
+ - File upload and processing
69
+ - Chat interface and history management
70
+
71
+ ### Workflow
72
+
73
+ 1. **Document Processing**:
74
+ - User uploads a PDF document
75
+ - System extracts text using PyMuPDF
76
+ - Text is chunked into semantically meaningful segments
77
+ - Embeddings are generated for each chunk
78
+ - A FAISS index is built for efficient similarity search
79
+
80
+ 2. **Query Processing**:
81
+ - User submits a question about the document
82
+ - System retrieves relevant chunks using semantic similarity
83
+ - Relevant chunks are combined into a context window
84
+ - Context and query are sent to the LLM for processing
85
+ - Optional: Web search integration for fact verification
86
+
87
+ 3. **Response Generation**:
88
+ - LLM generates a response based on the provided context
89
+ - If web search is enabled, additional information may be incorporated
90
+ - Response is returned to the user
91
+ - Chat history is updated and persisted
92
+
93
+ ## Technical Stack
94
+
95
+ ### Backend
96
+ - **Python 3.12**: Core programming language
97
+ - **FastAPI**: API framework with async support
98
+ - **PyMuPDF**: PDF processing library
99
+ - **LangChain**: Framework for LLM application development
100
+ - **FAISS**: Vector similarity search library from Facebook AI
101
+ - **Sentence Transformers**: Text embedding generation
102
+ - **Groq API**: LLM inference service
103
+ - **Tavily API**: Web search integration
104
+ - **Uvicorn**: ASGI server
105
+
106
+ ### Frontend
107
+ - **HTML/CSS/JavaScript**: Core web technologies
108
+ - **Font Awesome**: Icon library
109
+ - **Highlight.js**: Code syntax highlighting
110
+ - **Marked.js**: Markdown rendering
111
+ - **Responsive Design**: Mobile-friendly interface
112
+
113
+ *Note: The frontend was developed with assistance from Claude 3.7 AI.*
114
+
115
+ ### DevOps
116
+ - **Docker**: Containerization
117
+ - **GitHub Actions**: CI/CD pipeline
118
+ - **Hugging Face Spaces**: Deployment platform
119
+
120
+ ## Installation
121
+
122
+ ### Prerequisites
123
+ - Python 3.12+
124
+ - API keys for Groq and Tavily
125
+
126
+ ### Local Setup
127
+
128
+ 1. Clone the repository:
129
+ ```bash
130
+ git clone https://github.com/yourusername/PDF-Insight-Beta.git
131
+ cd PDF-Insight-Beta
132
+ ```
133
+
134
+ 2. Create and activate a virtual environment:
135
+ ```bash
136
+ python -m venv venv
137
+ source venv/bin/activate # On Windows: venv\Scripts\activate
138
+ ```
139
+
140
+ 3. Install dependencies:
141
+ ```bash
142
+ pip install -r requirements.txt
143
+ ```
144
+
145
+ 4. Create a `.env` file with your API keys:
146
+ ```
147
+ GROQ_API_KEY=your_groq_api_key
148
+ TAVILY_API_KEY=your_tavily_api_key
149
+ ```
150
+
151
+ 5. Run the application:
152
+ ```bash
153
+ uvicorn app:app --host 0.0.0.0 --port 8000 --reload
154
+ ```
155
+
156
+ ### Docker Deployment
157
+
158
+ 1. Build the Docker image:
159
+ ```bash
160
+ docker build -t pdf-insight-pro .
161
+ ```
162
+
163
+ 2. Run the container:
164
+ ```bash
165
+ docker run -p 7860:7860 \
166
+ --mount type=secret,id=GROQ_API_KEY,dst=/run/secrets/GROQ_API_KEY \
167
+ --mount type=secret,id=TAVILY_API_KEY,dst=/run/secrets/TAVILY_API_KEY \
168
+ pdf-insight-pro
169
+ ```
170
+
171
+ ## Usage
172
+
173
+ 1. Open the application in your browser at `http://localhost:8000`
174
+ 2. Upload a PDF document using the interface
175
+ 3. Wait for processing to complete
176
+ 4. Ask questions about the document in the chat interface
177
+ 5. Toggle the "Use web search" option for enhanced responses
178
+
179
+ ## API Endpoints
180
+
181
+ - **GET `/`**: Redirect to static HTML interface
182
+ - **POST `/upload-pdf`**: Upload and process a PDF document
183
+ - Returns a session ID for subsequent queries
184
+ - **POST `/chat`**: Send a query about the uploaded document
185
+ - Requires session ID from previous upload
186
+ - Optional parameter to enable web search
187
+ - **POST `/chat-history`**: Retrieve chat history for a session
188
+ - **POST `/clear-history`**: Clear chat history for a session
189
+ - **POST `/remove-pdf`**: Remove PDF and session data
190
+ - **GET `/models`**: List available language models
191
+
192
+ ## Deployment
193
+
194
+ ### Hugging Face Spaces
195
+
196
+ This project is configured for automatic deployment to Hugging Face Spaces using GitHub Actions. The workflow in `.github/workflows/sync_to_hf.yml` handles the deployment process.
197
+
198
+ To deploy to your own space:
199
+
200
+ 1. Fork this repository
201
+ 2. Create a Hugging Face Space
202
+ 3. Add your Hugging Face token as a GitHub secret named `HF_TOKEN`
203
+ 4. Update the username and space name in the workflow file
204
+ 5. Push to the main branch to trigger deployment
205
+
206
+ ## Android App
207
+
208
+ The repository includes an Android application that serves as a mobile interface to the web application. Rather than implementing a native client with direct API integration, the Android app utilizes a WebView component to load the deployed web interface from Hugging Face Spaces. This approach ensures consistency between the web and mobile experiences while reducing maintenance overhead.
209
+
210
+ ### Android App Features
211
+
212
+ - WebView integration to the deployed web application
213
+ - Splash screen with app branding
214
+ - Responsive design that adapts to the mobile interface
215
+ - Native Android navigation and user experience
216
+ - Direct access to the full functionality of the web application
217
+
218
+ ### Implementation Details
219
+
220
+ The Android app is implemented using Java and consists of:
221
+ - SplashActivity: Displays the app logo and transitions to the main activity
222
+ - MainActivity: Contains a WebView component that loads the deployed web application
223
+ - WebView configuration: Enables JavaScript, DOM storage, and handles file uploads
224
+
225
+ ## License
226
+
227
+ MIT