Spaces:
Running
Running
license: mit | |
title: PDF Insight PRO | |
sdk: docker | |
emoji: 💻 | |
colorFrom: blue | |
colorTo: green | |
short_description: Agentic RAG APP | |
# PDF Insight Pro | |
An advanced PDF document analysis tool that combines RAG (Retrieval Augmented Generation) with agentic search capabilities to provide intelligent answers to queries about PDF documents. | |
## Table of Contents | |
- [Overview](#overview) | |
- [Features](#features) | |
- [RAG SYSTEM PERFORMANCE](#rag-system-metrics) | |
- [Architecture](#architecture) | |
- [Technical Stack](#technical-stack) | |
- [Installation](#installation) | |
- [Usage](#usage) | |
- [API Endpoints](#api-endpoints) | |
- [Deployment](#deployment) | |
- [Android App](#android-app) | |
- [License](#license) | |
## Overview | |
PDF Insight Pro is a sophisticated document analysis tool that allows users to upload PDF documents and ask questions about their content. The system uses state-of-the-art RAG techniques, combining document chunking, embedding generation, similarity search, and LLM processing to provide accurate and contextually relevant answers. | |
The application employs an agentic approach that can augment the document's information with web search capabilities when needed, ensuring comprehensive and up-to-date responses. | |
## Features | |
- **PDF Document Processing**: Upload and process PDF documents with automated text extraction and chunking | |
- **Agentic RAG System**: Combines document retrieval with powerful LLM reasoning | |
- **Web Search Integration**: Verifies document information with Tavily search API integration | |
- **Session Management**: Persistent session handling for chat history and document context | |
- **Multiple LLM Support**: Choose from different language models (Llama 4 Scout, Llama 3.1, Llama 3.3) | |
- **FastAPI Backend**: High-performance API with async support | |
- **Responsive UI**: User-friendly interface adaptable to different screen sizes | |
- **Docker Containerization**: Easy deployment with containerized application | |
- **Hugging Face Integration**: Automatic deployment to Hugging Face Spaces | |
- **Android Application**: Native mobile client | |
## RAG System Metrics | |
1. **Key Metrics Overview**: | |
| Metric | Value | | |
| ----------------------------------- | ------- | | |
| **Semantic Similarity (Mean)** | `0.852` | | |
| **ROUGE-L F1 Score (Mean)** | `0.395` | | |
| **Semantic Similarity (Max)** | `1.000` | | |
| **ROUGE-L F1 Score (Max)** | `1.000` | | |
| **Semantic Similarity (Min)** | `0.592` | | |
| **ROUGE-L F1 Score (Min)** | `0.099` | | |
| **Standard Deviation (Similarity)** | `0.089` | | |
| **Standard Deviation (ROUGE-L F1)** | `0.217` | | |
2. **Quantile Distribution**: | |
| Percentile | Semantic Similarity | ROUGE-L F1 Score | | |
| ---------- | ------------------- | ---------------- | | |
| **25%** | `0.7946` | `0.2516` | | |
| **50%** | `0.8732` | `0.3256` | | |
| **75%** | `0.9181` | `0.4951` | | |
3. **Evaluation Status**: | |
| Status | Count | Percentage | | |
| ------ | ----- | ---------- | | |
| PASS | `64` | `85.3%` | | |
| FAIL | `11` | `14.7%` | | |
## Architecture | |
The application follows a modular architecture with these main components: | |
### System Architecture Diagram | |
```mermaid | |
--- | |
config: | |
theme: forest | |
look: neo | |
layout: dagre | |
--- | |
flowchart TD | |
subgraph subGraph0["Presentation Layer"] | |
direction TB | |
Browser["Web Browser UI"] | |
Android["Android WebView Client"] | |
end | |
subgraph subGraph1["API Layer"] | |
direction TB | |
APIGateway["FastAPI Entrypoints"] | |
ChatRoutes["chat_routes.py"] | |
SessionRoutes["session_routes.py"] | |
UploadRoutes["upload_routes.py"] | |
UtilityRoutes["utility_routes.py"] | |
AppMain["app.py"] | |
end | |
subgraph subGraph2["Config & Models"] | |
direction TB | |
ConfigLoader["config.py"] | |
DataModels["models.py"] | |
end | |
subgraph subGraph3["Service Layer"] | |
direction TB | |
RAGService["rag_service.py"] | |
LLMService["llm_service.py"] | |
SessionService["session_service.py"] | |
end | |
subgraph subGraph4["Utility Layer"] | |
direction TB | |
TextProc["text_processing.py"] | |
FaissUtil["faiss_utils.py"] | |
SessionUtil["session_utils.py"] | |
end | |
subgraph Storage["Storage"] | |
direction TB | |
UploadStore["/uploads (PDFs & sessions)"] | |
FAISSIndex["FAISS Index (ephemeral/disk)"] | |
end | |
subgraph subGraph6["Docker Container"] | |
direction TB | |
subGraph1 | |
subGraph2 | |
subGraph3 | |
subGraph4 | |
Storage | |
end | |
subgraph subGraph7["External & DevOps"] | |
direction TB | |
GroqAPI["Groq LLM API"] | |
TavilyAPI["Tavily Web Search API"] | |
CI["GitHub Actions CI/CD"] | |
HFS["HuggingFace Spaces"] | |
DockerfileNode["Dockerfile"] | |
end | |
subgraph subGraph8["Static Assets"] | |
direction TB | |
StaticApp["Static Web App"] | |
end | |
Browser -- HTTP JSON --> StaticApp | |
StaticApp -- HTTP JSON --> AppMain | |
Android -- HTTP JSON --> AppMain | |
AppMain -- routes --> ChatRoutes & SessionRoutes & UploadRoutes & UtilityRoutes | |
ChatRoutes -- calls --> RAGService | |
SessionRoutes -- calls --> SessionService | |
UploadRoutes -- calls --> TextProc | |
UtilityRoutes -- calls --> SessionUtil | |
RAGService -- uses --> LLMService & FaissUtil | |
RAGService -- calls --> GroqAPI & TavilyAPI | |
LLMService -- uses --> ConfigLoader | |
SessionService -- uses --> SessionUtil | |
TextProc -- writes/reads --> UploadStore | |
SessionUtil -- writes/reads --> UploadStore | |
FaissUtil -- reads/writes --> FAISSIndex | |
CI -- build & deploy --> DockerfileNode | |
DockerfileNode -- deploy --> HFS | |
Browser:::frontend | |
Android:::frontend | |
APIGateway:::api | |
ChatRoutes:::api | |
SessionRoutes:::api | |
UploadRoutes:::api | |
UtilityRoutes:::api | |
AppMain:::api | |
ConfigLoader:::service | |
DataModels:::service | |
RAGService:::service | |
LLMService:::service | |
SessionService:::service | |
TextProc:::util | |
FaissUtil:::util | |
SessionUtil:::util | |
UploadStore:::util | |
FAISSIndex:::util | |
GroqAPI:::external | |
TavilyAPI:::external | |
CI:::devops | |
HFS:::devops | |
DockerfileNode:::devops | |
StaticApp:::frontend | |
classDef frontend fill:#CCE5FF,stroke:#333,stroke-width:1px | |
classDef api fill:#DFFFD6,stroke:#333,stroke-width:1px | |
classDef service fill:#FFE5B4,stroke:#333,stroke-width:1px | |
classDef util fill:#E3E4FA,stroke:#333,stroke-width:1px | |
classDef external fill:#E0E0E0,stroke:#333,stroke-width:1px | |
classDef devops fill:#CCFFFF,stroke:#333,stroke-width:1px | |
click Android "https://github.com/jatin-mehra119/pdf-insight-beta/blob/main/Android%20App/app/src/main/res/layout/activity_splash.xml" | |
click ChatRoutes "https://github.com/jatin-mehra119/pdf-insight-beta/blob/main/api/chat_routes.py" | |
click SessionRoutes "https://github.com/jatin-mehra119/pdf-insight-beta/blob/main/api/session_routes.py" | |
click UploadRoutes "https://github.com/jatin-mehra119/pdf-insight-beta/blob/main/api/upload_routes.py" | |
click UtilityRoutes "https://github.com/jatin-mehra119/pdf-insight-beta/blob/main/api/utility_routes.py" | |
click AppMain "https://github.com/jatin-mehra119/pdf-insight-beta/blob/main/app.py" | |
click ConfigLoader "https://github.com/jatin-mehra119/pdf-insight-beta/blob/main/configs/config.py" | |
click DataModels "https://github.com/jatin-mehra119/pdf-insight-beta/blob/main/models/models.py" | |
click RAGService "https://github.com/jatin-mehra119/pdf-insight-beta/blob/main/services/rag_service.py" | |
click LLMService "https://github.com/jatin-mehra119/pdf-insight-beta/blob/main/services/llm_service.py" | |
click SessionService "https://github.com/jatin-mehra119/pdf-insight-beta/blob/main/services/session_service.py" | |
click TextProc "https://github.com/jatin-mehra119/pdf-insight-beta/blob/main/utils/text_processing.py" | |
click FaissUtil "https://github.com/jatin-mehra119/pdf-insight-beta/blob/main/utils/faiss_utils.py" | |
click SessionUtil "https://github.com/jatin-mehra119/pdf-insight-beta/blob/main/utils/session_utils.py" | |
click CI "https://github.com/jatin-mehra119/pdf-insight-beta/blob/main/.github/workflows/sync_to_hf.yml" | |
click DockerfileNode "https://github.com/jatin-mehra119/pdf-insight-beta/tree/main/Dockerfile" | |
click StaticApp "https://github.com/jatin-mehra119/pdf-insight-beta/blob/main/static/js/app.js" | |
``` | |
### Backend Components | |
1. **PDF Processing Module** (`preprocessing.py`): | |
- Document loading and text extraction using PyMuPDF | |
- Intelligent chunking with metadata preservation | |
- Embedding generation with sentence transformers | |
- FAISS vector index for similarity search | |
2. **RAG Engine**: | |
- Context retrieval based on semantic similarity | |
- LLM integration using Groq API | |
- Agentic processing with tool-calling capabilities | |
- Web search augmentation with Tavily API | |
3. **API Layer** (`app.py`): | |
- FastAPI framework for REST endpoints | |
- Session management and persistence | |
- File upload and processing | |
- Chat interface and history management | |
### Workflow | |
1. **Document Processing**: | |
- User uploads a PDF document | |
- System extracts text using PyMuPDF | |
- Text is chunked into semantically meaningful segments | |
- Embeddings are generated for each chunk | |
- A FAISS index is built for efficient similarity search | |
2. **Query Processing**: | |
- User submits a question about the document | |
- System retrieves relevant chunks using semantic similarity | |
- Relevant chunks are combined into a context window | |
- Context and query are sent to the LLM for processing | |
- Optional: Web search integration for fact verification | |
3. **Response Generation**: | |
- LLM generates a response based on the provided context | |
- If web search is enabled, additional information may be incorporated | |
- Response is returned to the user | |
- Chat history is updated and persisted | |
## Project Structure | |
The project is organized into a modular architecture with clear separation of concerns: | |
``` | |
PDF-Insight-Beta/ | |
├── app.py # Main FastAPI application entry point | |
├── gen_dataset.py # Dataset generation and RAG evaluation scripts | |
├── test_RAG.ipynb # Jupyter notebook for RAG system testing and metrics | |
├── requirements.txt # Python dependencies | |
├── Dockerfile # Container configuration for deployment | |
├── LICENSE # MIT license file | |
├── README.md # Project documentation | |
├── README_hf.md # Hugging Face Spaces specific documentation | |
├── | |
├── api/ # API route handlers (modular FastAPI routes) | |
│ ├── __init__.py # Exports all route handlers | |
│ ├── chat_routes.py # Chat and conversation management endpoints | |
│ ├── session_routes.py # Session lifecycle management | |
│ ├── upload_routes.py # PDF upload and processing endpoints | |
│ └── utility_routes.py # Utility endpoints (models, health checks) | |
├── | |
├── configs/ # Configuration management | |
│ └── config.py # Centralized configuration and environment variables | |
├── | |
├── models/ # Pydantic data models | |
│ └── models.py # Request/response models for API validation | |
├── | |
├── services/ # Core business logic services | |
│ ├── __init__.py # Service module initialization | |
│ ├── llm_service.py # Language model integration and management | |
│ ├── rag_service.py # RAG implementation with agentic capabilities | |
│ └── session_service.py # Session persistence and management | |
├── | |
├── utils/ # Utility functions and helpers | |
│ ├── __init__.py # Utility module initialization | |
│ ├── faiss_utils.py # FAISS vector database operations | |
│ ├── session_utils.py # Session data serialization/deserialization | |
│ └── text_processing.py # PDF text extraction and chunking utilities | |
├── | |
├── static/ # Frontend web application | |
│ ├── index.html # Main web interface | |
│ ├── css/ | |
│ │ └── styles.css # Application styling and responsive design | |
│ └── js/ | |
│ └── app.js # Frontend JavaScript for user interactions | |
├── | |
├── development_scripts/ # Legacy and development utilities | |
│ ├── app.py # Original monolithic application (deprecated) | |
│ └── preprocessing.py # Original preprocessing functions (deprecated) | |
├── | |
├── uploads/ # Temporary storage for uploaded files and sessions | |
│ ├── *.pdf # Uploaded PDF documents | |
│ └── *_session.pkl # Serialized session data | |
├── | |
└── Android App/ # Native Android application | |
├── app/ # Android app source code | |
│ ├── src/main/java/com/jatinmehra/ # Java source files | |
│ ├── src/main/res/ # Android resources (layouts, drawables, etc.) | |
│ └── AndroidManifest.xml # Android app configuration | |
├── gradle/ # Gradle build system files | |
└── build.gradle # Project build configuration | |
``` | |
### Key Components Description | |
#### Core Application Files | |
- **`app.py`**: Main FastAPI application that orchestrates all components and sets up the web server | |
- **`gen_dataset.py`**: Comprehensive evaluation script for RAG system performance using the neural-bridge dataset | |
- **`test_RAG.ipynb`**: Interactive Jupyter notebook for testing RAG capabilities and analyzing metrics | |
#### API Layer (`api/`) | |
- **`chat_routes.py`**: Handles chat interactions, query processing, and conversation flow | |
- **`session_routes.py`**: Manages session lifecycle, history retrieval, and cleanup operations | |
- **`upload_routes.py`**: Processes PDF uploads, text extraction, and document indexing | |
- **`utility_routes.py`**: Provides system utilities like model listing and health checks | |
#### Configuration (`configs/`) | |
- **`config.py`**: Centralizes all application settings, API keys, model configurations, and environment variables | |
#### Data Models (`models/`) | |
- **`models.py`**: Defines Pydantic models for request/response validation and API documentation | |
#### Business Logic (`services/`) | |
- **`llm_service.py`**: Manages language model interactions, prompt engineering, and response generation | |
- **`rag_service.py`**: Implements the core RAG pipeline with agentic search capabilities and tool integration | |
- **`session_service.py`**: Handles session persistence, chat history, and user context management | |
#### Utilities (`utils/`) | |
- **`faiss_utils.py`**: Provides FAISS vector database operations for similarity search and indexing | |
- **`session_utils.py`**: Handles session serialization, deserialization, and data persistence | |
- **`text_processing.py`**: PDF text extraction, intelligent chunking, and preprocessing utilities | |
#### Frontend (`static/`) | |
- **`index.html`**: Responsive web interface with modern UI design | |
- **`styles.css`**: CSS styling with mobile-first responsive design principles | |
- **`app.js`**: JavaScript for dynamic interactions, file uploads, and chat functionality | |
#### Mobile Application (`Android App/`) | |
- **Native Android client**: WebView-based mobile application that interfaces with the web app | |
- **Java source code**: Activity management, splash screen, and WebView configuration | |
- **Android resources**: UI layouts, icons, and mobile-specific configurations | |
## Technical Stack | |
### Backend | |
- **Python 3.12**: Core programming language | |
- **FastAPI**: API framework with async support | |
- **PyMuPDF**: PDF processing library | |
- **LangChain**: Framework for LLM application development | |
- **FAISS**: Vector similarity search library from Facebook AI | |
- **Sentence Transformers**: Text embedding generation | |
- **Groq API**: LLM inference service | |
- **Tavily API**: Web search integration | |
- **Uvicorn**: ASGI server | |
### Frontend | |
- **HTML/CSS/JavaScript**: Core web technologies | |
- **Font Awesome**: Icon library | |
- **Highlight.js**: Code syntax highlighting | |
- **Marked.js**: Markdown rendering | |
- **Responsive Design**: Mobile-friendly interface | |
*Note: The frontend was developed with assistance from Claude 3.7 AI.* | |
### DevOps | |
- **Docker**: Containerization | |
- **GitHub Actions**: CI/CD pipeline | |
- **Hugging Face Spaces**: Deployment platform | |
## Installation | |
### Prerequisites | |
- Python 3.12+ | |
- API keys for Groq and Tavily | |
### Local Setup | |
1. Clone the repository: | |
```bash | |
git clone https://github.com/Jatin-Mehra119/PDF-Insight-Beta.git | |
cd PDF-Insight-Beta | |
``` | |
2. Create and activate a virtual environment: | |
```bash | |
python -m venv venv | |
source venv/bin/activate # On Windows: venv\Scripts\activate | |
``` | |
3. Install dependencies: | |
```bash | |
pip install -r requirements.txt | |
``` | |
4. Create a `.env` file with your API keys: | |
``` | |
GROQ_API_KEY=your_groq_api_key | |
TAVILY_API_KEY=your_tavily_api_key | |
``` | |
5. Run the application: | |
```bash | |
uvicorn app:app --host 0.0.0.0 --port 8000 --reload | |
``` | |
### Docker Deployment | |
1. Build the Docker image: | |
```bash | |
docker build -t pdf-insight-pro . | |
``` | |
2. Run the container: | |
```bash | |
docker run -p 7860:7860 \ | |
--mount type=secret,id=GROQ_API_KEY,dst=/run/secrets/GROQ_API_KEY \ | |
--mount type=secret,id=TAVILY_API_KEY,dst=/run/secrets/TAVILY_API_KEY \ | |
pdf-insight-pro | |
``` | |
## Usage | |
1. Open the application in your browser at `http://localhost:8000` | |
2. Upload a PDF document using the interface | |
3. Wait for processing to complete | |
4. Ask questions about the document in the chat interface | |
5. Toggle the "Use web search" option for enhanced responses | |
## API Endpoints | |
- **GET `/`**: Redirect to static HTML interface | |
- **POST `/upload-pdf`**: Upload and process a PDF document | |
- Returns a session ID for subsequent queries | |
- **POST `/chat`**: Send a query about the uploaded document | |
- Requires session ID from previous upload | |
- Optional parameter to enable web search | |
- **POST `/chat-history`**: Retrieve chat history for a session | |
- **POST `/clear-history`**: Clear chat history for a session | |
- **POST `/remove-pdf`**: Remove PDF and session data | |
- **GET `/models`**: List available language models | |
## Deployment | |
### Hugging Face Spaces | |
This project is configured for automatic deployment to Hugging Face Spaces using GitHub Actions. The workflow in `.github/workflows/sync_to_hf.yml` handles the deployment process. | |
To deploy to your own space: | |
1. Fork this repository | |
2. Create a Hugging Face Space | |
3. Add your Hugging Face token as a GitHub secret named `HF_TOKEN` | |
4. Update the username and space name in the workflow file | |
5. Push to the main branch to trigger deployment | |
## Android App | |
The repository includes an Android application that serves as a mobile interface to the web application. Rather than implementing a native client with direct API integration, the Android app utilizes a WebView component to load the deployed web interface from Hugging Face Spaces. This approach ensures consistency between the web and mobile experiences while reducing maintenance overhead. | |
### Android App Features | |
- WebView integration to the deployed web application | |
- Splash screen with app branding | |
- Responsive design that adapts to the mobile interface | |
- Native Android navigation and user experience | |
- Direct access to the full functionality of the web application | |
### Implementation Details | |
The Android app is implemented using Java and consists of: | |
- SplashActivity: Displays the app logo and transitions to the main activity | |
- MainActivity: Contains a WebView component that loads the deployed web application | |
- WebView configuration: Enables JavaScript, DOM storage, and handles file uploads | |
## License | |
MIT |