jatinmehra commited on
Commit
8114e38
·
1 Parent(s): 1dc0983

Update README.md to enhance documentation with detailed features, architecture, and usage instructions

Browse files
Files changed (1) hide show
  1. README.md +216 -0
README.md CHANGED
@@ -1 +1,217 @@
1
  # PDF Insight Pro
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  # PDF Insight Pro
2
+
3
+ An advanced PDF document analysis tool that combines RAG (Retrieval Augmented Generation) with agentic search capabilities to provide intelligent answers to queries about PDF documents.
4
+
5
+ ## Table of Contents
6
+
7
+ - [Overview](#overview)
8
+ - [Features](#features)
9
+ - [Architecture](#architecture)
10
+ - [Technical Stack](#technical-stack)
11
+ - [Installation](#installation)
12
+ - [Usage](#usage)
13
+ - [API Endpoints](#api-endpoints)
14
+ - [Deployment](#deployment)
15
+ - [Android App](#android-app)
16
+ - [License](#license)
17
+
18
+ ## Overview
19
+
20
+ PDF Insight Pro is a sophisticated document analysis tool that allows users to upload PDF documents and ask questions about their content. The system uses state-of-the-art RAG techniques, combining document chunking, embedding generation, similarity search, and LLM processing to provide accurate and contextually relevant answers.
21
+
22
+ The application employs an agentic approach that can augment the document's information with web search capabilities when needed, ensuring comprehensive and up-to-date responses.
23
+
24
+ ## Features
25
+
26
+ - **PDF Document Processing**: Upload and process PDF documents with automated text extraction and chunking
27
+ - **Agentic RAG System**: Combines document retrieval with powerful LLM reasoning
28
+ - **Web Search Integration**: Verifies document information with Tavily search API integration
29
+ - **Session Management**: Persistent session handling for chat history and document context
30
+ - **Multiple LLM Support**: Choose from different language models (Llama 4 Scout, Llama 3.1, Llama 3.3)
31
+ - **FastAPI Backend**: High-performance API with async support
32
+ - **Responsive UI**: User-friendly interface adaptable to different screen sizes
33
+ - **Docker Containerization**: Easy deployment with containerized application
34
+ - **Hugging Face Integration**: Automatic deployment to Hugging Face Spaces
35
+ - **Android Application**: Native mobile client
36
+
37
+ ## Architecture
38
+
39
+ The application follows a modular architecture with these main components:
40
+
41
+ ### Backend Components
42
+
43
+ 1. **PDF Processing Module** (`preprocessing.py`):
44
+ - Document loading and text extraction using PyMuPDF
45
+ - Intelligent chunking with metadata preservation
46
+ - Embedding generation with sentence transformers
47
+ - FAISS vector index for similarity search
48
+
49
+ 2. **RAG Engine**:
50
+ - Context retrieval based on semantic similarity
51
+ - LLM integration using Groq API
52
+ - Agentic processing with tool-calling capabilities
53
+ - Web search augmentation with Tavily API
54
+
55
+ 3. **API Layer** (`app.py`):
56
+ - FastAPI framework for REST endpoints
57
+ - Session management and persistence
58
+ - File upload and processing
59
+ - Chat interface and history management
60
+
61
+ ### Workflow
62
+
63
+ 1. **Document Processing**:
64
+ - User uploads a PDF document
65
+ - System extracts text using PyMuPDF
66
+ - Text is chunked into semantically meaningful segments
67
+ - Embeddings are generated for each chunk
68
+ - A FAISS index is built for efficient similarity search
69
+
70
+ 2. **Query Processing**:
71
+ - User submits a question about the document
72
+ - System retrieves relevant chunks using semantic similarity
73
+ - Relevant chunks are combined into a context window
74
+ - Context and query are sent to the LLM for processing
75
+ - Optional: Web search integration for fact verification
76
+
77
+ 3. **Response Generation**:
78
+ - LLM generates a response based on the provided context
79
+ - If web search is enabled, additional information may be incorporated
80
+ - Response is returned to the user
81
+ - Chat history is updated and persisted
82
+
83
+ ## Technical Stack
84
+
85
+ ### Backend
86
+ - **Python 3.12**: Core programming language
87
+ - **FastAPI**: API framework with async support
88
+ - **PyMuPDF**: PDF processing library
89
+ - **LangChain**: Framework for LLM application development
90
+ - **FAISS**: Vector similarity search library from Facebook AI
91
+ - **Sentence Transformers**: Text embedding generation
92
+ - **Groq API**: LLM inference service
93
+ - **Tavily API**: Web search integration
94
+ - **Uvicorn**: ASGI server
95
+
96
+ ### Frontend
97
+ - **HTML/CSS/JavaScript**: Core web technologies
98
+ - **Font Awesome**: Icon library
99
+ - **Highlight.js**: Code syntax highlighting
100
+ - **Marked.js**: Markdown rendering
101
+ - **Responsive Design**: Mobile-friendly interface
102
+
103
+ *Note: The frontend was developed with assistance from Claude 3.7 AI.*
104
+
105
+ ### DevOps
106
+ - **Docker**: Containerization
107
+ - **GitHub Actions**: CI/CD pipeline
108
+ - **Hugging Face Spaces**: Deployment platform
109
+
110
+ ## Installation
111
+
112
+ ### Prerequisites
113
+ - Python 3.12+
114
+ - API keys for Groq and Tavily
115
+
116
+ ### Local Setup
117
+
118
+ 1. Clone the repository:
119
+ ```bash
120
+ git clone https://github.com/yourusername/PDF-Insight-Beta.git
121
+ cd PDF-Insight-Beta
122
+ ```
123
+
124
+ 2. Create and activate a virtual environment:
125
+ ```bash
126
+ python -m venv venv
127
+ source venv/bin/activate # On Windows: venv\Scripts\activate
128
+ ```
129
+
130
+ 3. Install dependencies:
131
+ ```bash
132
+ pip install -r requirements.txt
133
+ ```
134
+
135
+ 4. Create a `.env` file with your API keys:
136
+ ```
137
+ GROQ_API_KEY=your_groq_api_key
138
+ TAVILY_API_KEY=your_tavily_api_key
139
+ ```
140
+
141
+ 5. Run the application:
142
+ ```bash
143
+ uvicorn app:app --host 0.0.0.0 --port 8000 --reload
144
+ ```
145
+
146
+ ### Docker Deployment
147
+
148
+ 1. Build the Docker image:
149
+ ```bash
150
+ docker build -t pdf-insight-pro .
151
+ ```
152
+
153
+ 2. Run the container:
154
+ ```bash
155
+ docker run -p 7860:7860 \
156
+ --mount type=secret,id=GROQ_API_KEY,dst=/run/secrets/GROQ_API_KEY \
157
+ --mount type=secret,id=TAVILY_API_KEY,dst=/run/secrets/TAVILY_API_KEY \
158
+ pdf-insight-pro
159
+ ```
160
+
161
+ ## Usage
162
+
163
+ 1. Open the application in your browser at `http://localhost:8000`
164
+ 2. Upload a PDF document using the interface
165
+ 3. Wait for processing to complete
166
+ 4. Ask questions about the document in the chat interface
167
+ 5. Toggle the "Use web search" option for enhanced responses
168
+
169
+ ## API Endpoints
170
+
171
+ - **GET `/`**: Redirect to static HTML interface
172
+ - **POST `/upload-pdf`**: Upload and process a PDF document
173
+ - Returns a session ID for subsequent queries
174
+ - **POST `/chat`**: Send a query about the uploaded document
175
+ - Requires session ID from previous upload
176
+ - Optional parameter to enable web search
177
+ - **POST `/chat-history`**: Retrieve chat history for a session
178
+ - **POST `/clear-history`**: Clear chat history for a session
179
+ - **POST `/remove-pdf`**: Remove PDF and session data
180
+ - **GET `/models`**: List available language models
181
+
182
+ ## Deployment
183
+
184
+ ### Hugging Face Spaces
185
+
186
+ This project is configured for automatic deployment to Hugging Face Spaces using GitHub Actions. The workflow in `.github/workflows/sync_to_hf.yml` handles the deployment process.
187
+
188
+ To deploy to your own space:
189
+
190
+ 1. Fork this repository
191
+ 2. Create a Hugging Face Space
192
+ 3. Add your Hugging Face token as a GitHub secret named `HF_TOKEN`
193
+ 4. Update the username and space name in the workflow file
194
+ 5. Push to the main branch to trigger deployment
195
+
196
+ ## Android App
197
+
198
+ The repository includes an Android application that serves as a mobile interface to the web application. Rather than implementing a native client with direct API integration, the Android app utilizes a WebView component to load the deployed web interface from Hugging Face Spaces. This approach ensures consistency between the web and mobile experiences while reducing maintenance overhead.
199
+
200
+ ### Android App Features
201
+
202
+ - WebView integration to the deployed web application
203
+ - Splash screen with app branding
204
+ - Responsive design that adapts to the mobile interface
205
+ - Native Android navigation and user experience
206
+ - Direct access to the full functionality of the web application
207
+
208
+ ### Implementation Details
209
+
210
+ The Android app is implemented using Java and consists of:
211
+ - SplashActivity: Displays the app logo and transitions to the main activity
212
+ - MainActivity: Contains a WebView component that loads the deployed web application
213
+ - WebView configuration: Enables JavaScript, DOM storage, and handles file uploads
214
+
215
+ ## License
216
+
217
+ MIT