Spaces:

vk98
/

colpali-visual-retrieval

Build error

App Files Files Community

colpali-visual-retrieval / backend /about.md

vk98

Initial deployment of ColPali Visual Retrieval backend

a54266b 21 days ago

preview code

raw

history blame contribute delete

31.4 kB

	# ColPali 🤝 Vespa - Visual Retrieval System

	A powerful visual document retrieval system that combines ColPali (Contextual Late Interaction with Patch-level Information) with Vespa for scalable, intelligent document search and question-answering.

	## 🌟 Features

	### 🔍 Visual Document Search

	- Multi-modal retrieval: Search through PDF documents using natural language queries
	- Visual understanding: ColPali model processes document images and text simultaneously
	- Token-level similarity maps: Visualize exactly which parts of documents match your query
	- Multiple ranking algorithms: Choose between hybrid, semantic, and other ranking methods

	### 🧠 AI-Powered Chat

	- Intelligent Q&A: Ask questions about retrieved documents using Google Gemini 2.0
	- Context-aware responses: AI analyzes document images to provide accurate answers
	- Real-time streaming: Get responses as they're generated

	### ⚡ Scalable Infrastructure

	- Vespa integration: Enterprise-grade search platform for large document collections
	- Real-time processing: Instant search results and similarity map generation
	- Cloud-ready: Supports Vespa Cloud deployment with secure authentication

	## 🏗️ Architecture

	```
	┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
	│ Frontend │ │ Backend │ │ Vespa Cloud │
	│ (Browser) │ │ (Your Local │ │ (Remote) │
	│ │ │ Computer) │ │ │
	│ • Search UI │◄──►│ • ColPali Model │◄──►│ • Document Store│
	│ • Similarity │ │ • Query Proc. │ │ • Vector Search │
	│ Maps │ │ • Sim Map Gen. │ │ • Ranking │
	│ • Chat Interface│ │ • Gemini Int. │ │ │
	└─────────────────┘ └─────────────────┘ └─────────────────┘
	↑ ↑ ↑
	Web Browser LOCAL AI REMOTE Storage
	```

	### 🏠 LOCAL Processing (Your Computer)

	All AI model inference happens on YOUR local machine:

	- ColPali Model: Runs locally on your GPU/CPU (~7GB model)
	- Document Processing: PDF → Images → Embeddings (local)
	- Query Processing: Text → Embeddings (local)
	- Similarity Maps: Visual attention generation (local)
	- Gemini Chat: Processes retrieved images locally

	Device Detection:

	```python
	device = get_torch_device("auto") # Detects: CUDA, MPS (Apple), or CPU
	print(f"Using device: {device}") # Shows YOUR hardware
	```

	### ☁️ REMOTE Processing (Vespa Cloud)

	Only storage and search index operations happen remotely:

	- Document Storage: Stores processed embeddings (not raw models)
	- Vector Search: Fast similarity search across document collection
	- Query Routing: Handles search requests and ranking
	- Metadata Storage: Document titles, URLs, page numbers

	### 🔄 Complete Data Flow

	#### Document Upload Process:

	1. LOCAL: Your computer downloads PDF from URL
	2. LOCAL: ColPali converts PDF pages to images
	3. LOCAL: ColPali generates visual embeddings (1024 patches × 128 dims)
	4. LOCAL: Embeddings converted to binary format for efficiency
	5. REMOTE: Binary embeddings uploaded to Vespa Cloud
	6. REMOTE: Vespa indexes embeddings for fast search

	#### Search Query Process:

	1. LOCAL: You enter search query in browser
	2. LOCAL: ColPali processes query → generates query embeddings
	3. REMOTE: Query embeddings sent to Vespa Cloud
	4. REMOTE: Vespa searches document index, returns matches
	5. LOCAL: ColPali generates similarity maps for results
	6. BROWSER: Results displayed with visual attention maps

	#### AI Chat Process:

	1. LOCAL: Retrieved document images processed by your machine
	2. REMOTE: Images + query sent to Google Gemini API
	3. REMOTE: Gemini generates response based on visual content
	4. BROWSER: Streaming response displayed in real-time

	### Core Components

	- ColPali Model: Visual-language model for document understanding (LOCAL)
	- Vespa Search: Distributed search and storage engine (REMOTE)
	- FastHTML Frontend: Modern, responsive web interface (BROWSER)
	- Gemini Integration: AI-powered question answering (REMOTE API)
	- Similarity Map Generator: Visual attention visualization (LOCAL)

	## 💻 System Requirements

	### LOCAL Machine Requirements (For AI Processing)

	Minimum:

	- CPU: Modern multi-core processor (Intel/AMD/Apple Silicon)
	- RAM: 8GB+ (16GB recommended)
	- Storage: 10GB free space (for model cache)
	- Python: 3.10+ (< 3.13)

	Recommended:

	- GPU: NVIDIA GPU with 8GB+ VRAM (RTX 3070/4060 or better)
	- Apple: M1/M2/M3 Mac (uses Metal Performance Shaders)
	- RAM: 16GB+ for smoother processing
	- Storage: SSD for faster model loading

	Performance Examples:

	- RTX 4090: ~1-2 seconds per query
	- RTX 3070: ~3-5 seconds per query
	- Apple M2: ~4-6 seconds per query
	- CPU Only: ~15-30 seconds per query

	### REMOTE Requirements (Vespa Cloud)

	What you need:

	- Vespa Cloud account (handles all remote processing)
	- Internet connection (for uploading embeddings and search queries)
	- Authentication tokens (provided by Vespa Cloud)

	What Vespa Cloud provides:

	- Scalable storage for any number of documents
	- Sub-second search across millions of embeddings
	- High availability with automatic failover
	- Global CDN for fast access worldwide

	## 💰 Cost Breakdown

	### FREE Components

	- ColPali Model: Open source, runs locally (no per-query costs)
	- Python Application: MIT/Apache licensed, completely free
	- Local Processing: Uses your own hardware (no cloud AI fees)

	### PAID Components

	- Vespa Cloud: Pay for storage and search operations
	- ~$0.001 per 1000 searches
	- ~$0.10 per GB storage per month
	- Google Gemini API: Optional, for chat features only
	- ~$0.01 per 1000 image tokens
	- Only used when you ask questions about documents

	### Cost Examples (Monthly)

	- Personal Use (100 documents, 1000 searches): ~$5-10/month
	- Small Business (1000 documents, 10k searches): ~$20-50/month
	- Enterprise (10k+ documents, 100k+ searches): $200+/month

	💡 Cost Optimization Tips:

	- Use local Vespa installation to avoid cloud costs
	- Disable Gemini chat if not needed (saves API costs)
	- Process documents in batches to minimize upload time

	## 🚀 Quick Start

	### Prerequisites

	- Python 3.10+ (< 3.13)
	- 8GB+ RAM for ColPali model
	- Vespa Cloud account or local Vespa installation
	- Google Gemini API key (optional, for chat features)
	- GPU recommended but not required

	### 1. Installation

	```bash
	# Clone the repository
	git clone <repository-url>
	cd colpali-vespa-visual-retrieval

	# Install dependencies
	pip install -e .

	# For development
	pip install -e ".[dev]"

	# For document feeding capabilities
	pip install -e ".[feed]"
	```

	### 2. Environment Configuration

	Create a `.env` file with your configuration:

	```bash
	# Vespa Configuration
	VESPA_APP_TOKEN_URL=https://your-app.vespa-cloud.com
	VESPA_CLOUD_SECRET_TOKEN=your_secret_token

	# Alternative: mTLS Authentication
	USE_MTLS=false
	VESPA_APP_MTLS_URL=https://your-app.vespa-cloud.com
	VESPA_CLOUD_MTLS_KEY="-----BEGIN PRIVATE KEY-----..."
	VESPA_CLOUD_MTLS_CERT="-----BEGIN CERTIFICATE-----..."

	# Optional: Gemini AI (for chat features)
	GEMINI_API_KEY=your_gemini_api_key

	# Optional: Logging
	LOG_LEVEL=INFO
	HOT_RELOAD=false
	```

	### 3. Deploy Vespa Application

	```bash
	# Deploy the Vespa schema and configuration
	python deploy_vespa_app.py \
	--tenant_name your_tenant \
	--vespa_application_name colpalidemo \
	--token_id_write colpalidemo_write \
	--token_id_read colpalidemo_read
	```

	### 4. Run the Application

	```bash
	python main.py
	```

	The application will be available at `http://localhost:7860`

	## 📚 Document Management

	### Uploading Documents

	Use the feeding script to process and upload PDF documents:

	```bash
	python feed_vespa.py \
	--application_name colpalidemo \
	--vespa_schema_name pdf_page
	```

	Document Processing Pipeline (LOCAL → REMOTE):

	1. PDF Download (LOCAL): Your computer downloads PDFs from URLs
	2. PDF Conversion (LOCAL): PDFs converted to images (one per page)
	3. ColPali Processing (LOCAL): Each page processed by ColPali model on YOUR GPU/CPU
	4. Embedding Generation (LOCAL): Visual embeddings created (1024 patches × 128 dimensions)
	5. Binary Encoding (LOCAL): Embeddings converted to efficient binary format
	6. Vespa Upload (REMOTE): Binary embeddings uploaded to Vespa Cloud
	7. Search Indexing (REMOTE): Vespa indexes embeddings for fast retrieval

	⚠️ Important Notes:

	- Processing Time: Expect 5-30 seconds per page depending on your hardware
	- Network Usage: Only final embeddings uploaded (~1KB per page vs ~1MB original)
	- Privacy: Original PDFs and images stay on your local machine
	- Storage: Raw images cached locally for similarity map generation

	### Supported Operations

	- ✅ Upload Documents: Add new PDFs to the system
	- ✅ Search Documents: Query existing documents
	- ✅ View Documents: Browse stored documents
	- ❌ Remove Documents: _Not currently implemented_
	- ❌ Update Documents: _Not currently implemented_

	## 🔐 Authentication & Security

	### 🛡️ Current Security Implementation

	#### SECURE Components:

	Vespa Authentication (REMOTE)

	- Token Authentication: Bearer tokens for Vespa Cloud API access
	- mTLS Certificates: Mutual TLS for enterprise security
	- Encrypted Communication: HTTPS/TLS for all Vespa connections

	API Key Management (LOCAL)

	- Environment Variables: Sensitive keys stored in `.env` files
	- API Key Rotation: Google Gemini supports key rotation
	- Local Storage: Keys never transmitted except to authorized APIs

	#### LIMITED Security Components:

	Session Management

	```python
	# Basic UUID session tracking (FastHTML)
	session["session_id"] = str(uuid.uuid4())

	# HTTP-only cookies (Next.js)
	cookieStore.set(SESSION_KEY, newSessionId, {
	httpOnly: true,
	secure: process.env.NODE_ENV === "production",
	sameSite: "lax",
	maxAge: 60 * 60 * 24 * 30, // 30 days
	});
	```

	Basic Request Validation

	```python
	# HTMX request validation
	if "hx-request" not in request.headers:
	return RedirectResponse("/search")

	# Parameter validation
	if not query:
	return NextResponse.json({ error: "Query is required" }, { status: 400 });
	```

	### ⚠️ Security Limitations & Risks

	#### MISSING Security Features:

	❌ No API Authentication

	- Local API endpoints are completely open
	- No rate limiting or abuse protection
	- No user authentication or authorization
	- Anyone can access `/fetch_results`, `/get_sim_map` endpoints

	❌ No Input Sanitization

	```python
	# Raw user input passed directly to models
	query = searchParams.get("query") # No validation/sanitization
	ranking = searchParams.get("ranking") # No input filtering
	```

	❌ No Security Headers

	- No CORS configuration
	- No Content Security Policy (CSP)
	- No X-Frame-Options protection
	- No X-Content-Type-Options validation

	❌ No Rate Limiting

	- Unlimited API requests
	- No protection against DoS attacks
	- No query throttling or user limits

	❌ No CSRF Protection

	- No token validation for state-changing operations
	- Cross-site request forgery possible

	### 🎯 Security Recommendations

	#### IMMEDIATE (High Priority)

	1. Add API Authentication

	```typescript
	// middleware.ts - Add API key validation
	export function middleware(request: NextRequest) {
	const apiKey = request.headers.get("X-API-Key");
	if (!apiKey \|\| apiKey !== process.env.COLPALI_API_KEY) {
	return new Response("Unauthorized", { status: 401 });
	}
	}
	```

	2. Implement Rate Limiting

	```typescript
	// Use next-rate-limit or similar
	import rateLimit from "@/lib/rate-limit";

	const limiter = rateLimit({
	interval: 60 * 1000, // 1 minute
	uniqueTokenPerInterval: 500, // Limit each IP to 100 requests per interval
	});

	await limiter.check(10, getClientIP(request)); // 10 requests per minute
	```

	3. Add Security Headers

	```typescript
	// next.config.js
	const securityHeaders = [
	{ key: "X-Frame-Options", value: "DENY" },
	{ key: "X-Content-Type-Options", value: "nosniff" },
	{ key: "Referrer-Policy", value: "strict-origin-when-cross-origin" },
	{
	key: "Content-Security-Policy",
	value: "default-src 'self'; script-src 'self' 'unsafe-inline'",
	},
	];
	```

	4. Input Validation & Sanitization

	```typescript
	import { z } from "zod";

	const SearchSchema = z.object({
	query: z
	.string()
	.min(1)
	.max(500)
	.regex(/^[a-zA-Z0-9\s\.\?\!]*$/),
	ranking: z.enum(["hybrid", "colpali", "bm25"]),
	});
	```

	#### MEDIUM Priority

	5. CORS Configuration

	```typescript
	// Restrict origins to known domains
	const corsHeaders = {
	"Access-Control-Allow-Origin": "https://yourdomain.com",
	"Access-Control-Allow-Methods": "GET, POST, OPTIONS",
	"Access-Control-Allow-Headers": "Content-Type, Authorization",
	};
	```

	6. Request Size Limits

	```typescript
	// Limit request payload sizes
	export const config = {
	api: {
	bodyParser: {
	sizeLimit: "1mb",
	},
	},
	};
	```

	7. Audit Logging

	```python
	# Log all API access with IP, timestamp, and queries
	logger.info(f"API_ACCESS: {client_ip} - {endpoint} - {query[:100]}")
	```

	#### LONG-TERM (Production Ready)

	8. User Authentication (Optional)

	```typescript
	// Add NextAuth.js or similar for user accounts
	// Implement role-based access control
	// Add document ownership and permissions
	```

	9. Network Security

	```bash
	# Deploy behind reverse proxy (nginx/cloudflare)
	# Enable DDoS protection
	# Use Web Application Firewall (WAF)
	```

	10. Data Privacy Controls

	```typescript
	// Implement data retention policies
	// Add user data deletion capabilities
	// GDPR compliance features
	```

	### 🔒 Security Best Practices

	#### For LOCAL Development:

	- Never commit API keys to version control
	- Use strong environment variable names (avoid `API_KEY`)
	- Rotate API keys regularly (monthly)
	- Enable firewall on development machines
	- Use HTTPS even locally for production testing

	#### For PRODUCTION Deployment:

	- Deploy behind CDN/WAF (Cloudflare, AWS Shield)
	- Enable rate limiting at infrastructure level
	- Use container security scanning
	- Implement monitoring and alerting
	- Regular security audits and penetration testing

	#### For REMOTE Services:

	- Vespa Cloud: Follows enterprise security standards
	- Gemini API: Google-managed security and compliance
	- Environment Isolation: Separate dev/staging/prod credentials

	### 🚨 Current Risk Level: MEDIUM

	Suitable for:

	- ✅ Personal projects and demos
	- ✅ Internal company tools (behind firewall)
	- ✅ Research and development environments

	NOT suitable for:

	- ❌ Public internet deployment
	- ❌ Customer-facing applications
	- ❌ Production environments with sensitive data
	- ❌ Commercial applications without security hardening

	## 🎯 Usage Guide

	### Basic Search

	1. Navigate to the homepage
	2. Enter your search query in natural language
	3. Select ranking method (hybrid, semantic, etc.)
	4. View results with similarity maps

	### Similarity Maps

	- Click on token buttons to see which parts of documents match specific query terms
	- Visual heatmaps show attention patterns
	- Reset button returns to original document view

	### AI Chat

	- Ask questions about retrieved documents
	- Chat responses are based on document content
	- Streaming responses for real-time interaction

	### Search Rankings

	- Hybrid: Combines multiple ranking signals
	- Semantic: Pure semantic similarity
	- BM25: Traditional text-based ranking
	- ColPali: Visual-first ranking

	## 🛠️ Development

	### Project Structure

	```
	├── main.py # Application entry point
	├── backend/
	│ ├── colpali.py # ColPali model integration
	│ ├── vespa_app.py # Vespa client and queries
	│ └── modelmanager.py # Model management utilities
	├── frontend/
	│ ├── app.py # UI components
	│ └── layout.py # Layout templates
	├── feed_vespa.py # Document upload script
	├── deploy_vespa_app.py # Vespa deployment script
	├── colpali-with-snippets/ # Vespa schema definitions
	└── static/ # Static assets and generated files
	```

	### Running in Development

	```bash
	# Enable hot reload
	export HOT_RELOAD=true
	python main.py

	# Or set in .env
	echo "HOT_RELOAD=true" >> .env
	```

	### Code Quality

	```bash
	# Format code
	ruff format .

	# Lint code
	ruff check .
	```

	## 📊 API Endpoints

	### Current API Routes (⚠️ UNSECURED)

	\| Endpoint \| Method \| Description \| Security Status \|
	\| ---------------- \| ------ \| ----------------------- \| ---------------- \|
	\| `/` \| GET \| Homepage \| ✅ Public (safe) \|
	\| `/search` \| GET \| Search interface \| ✅ Public (safe) \|
	\| `/fetch_results` \| GET \| Fetch search results \| ⚠️ OPEN API \|
	\| `/get_sim_map` \| GET \| Get similarity maps \| ⚠️ OPEN API \|
	\| `/get-message` \| GET \| Chat with AI (SSE) \| ⚠️ OPEN API \|
	\| `/full_image` \| GET \| Get full document image \| ⚠️ OPEN API \|
	\| `/suggestions` \| GET \| Query autocomplete \| ⚠️ OPEN API \|
	\| `/static/*` \| GET \| Static file serving \| ✅ Public (safe) \|

	### Security Analysis by Endpoint

	#### 🔒 SECURE Endpoints

	- `/` and `/search`: Static HTML pages, no sensitive data
	- *`/static/`**: Public assets (CSS, JS, images)

	#### ⚠️ UNSECURED Endpoints (Risk)

	`/fetch_results` - HIGH RISK

	```bash
	# Anyone can perform unlimited searches
	curl "http://localhost:7860/fetch_results?query=secret&ranking=hybrid"
	```

	- Risks: Resource abuse, server overload, competitive intelligence gathering
	- Exposes: Search capabilities, document metadata, processing times

	`/get_sim_map` - MEDIUM RISK

	```bash
	# Access similarity maps without authentication
	curl "http://localhost:7860/get_sim_map?query_id=123&idx=0&token=word&token_idx=5"
	```

	- Risks: Unauthorized access to visual analysis
	- Exposes: Document visual patterns, query insights

	`/get-message` - HIGH RISK

	```bash
	# Trigger AI processing without limits
	curl "http://localhost:7860/get-message?query_id=123&query=question&doc_ids=doc1,doc2"
	```

	- Risks: Gemini API abuse, cost exploitation, resource exhaustion
	- Exposes: AI-generated insights, document content analysis

	`/full_image` - HIGH RISK

	```bash
	# Download any document image
	curl "http://localhost:7860/full_image?doc_id=any_document_id"
	```

	- Risks: Unauthorized document access, data leakage
	- Exposes: Full document images, potentially sensitive content

	### Immediate Security Fixes

	#### 1. Add API Key Authentication

	```python
	# Python FastHTML middleware
	@app.middleware("http")
	async def verify_api_key(request, call_next):
	if request.url.path.startswith("/fetch_results"):
	api_key = request.headers.get("X-API-Key")
	if not api_key or api_key != os.getenv("COLPALI_API_KEY"):
	return JSONResponse({"error": "Unauthorized"}, status_code=401)
	return await call_next(request)
	```

	#### 2. Implement Rate Limiting

	```python
	from slowapi import Limiter, _rate_limit_exceeded_handler
	from slowapi.util import get_remote_address

	limiter = Limiter(key_func=get_remote_address)

	@rt("/fetch_results")
	@limiter.limit("10/minute") # 10 requests per minute per IP
	async def get_results(request, query: str, ranking: str):
	# ... existing code
	```

	#### 3. Input Validation

	```python
	from pydantic import BaseModel, validator

	class SearchRequest(BaseModel):
	query: str
	ranking: str

	@validator('query')
	def query_must_be_safe(cls, v):
	if len(v) > 500:
	raise ValueError('Query too long')
	# Add sanitization logic
	return v.strip()
	```

	#### 4. Request Origin Validation

	```python
	ALLOWED_ORIGINS = ["http://localhost:3000", "https://yourdomain.com"]

	@app.middleware("http")
	async def cors_middleware(request, call_next):
	origin = request.headers.get("origin")
	if origin not in ALLOWED_ORIGINS:
	return JSONResponse({"error": "Forbidden"}, status_code=403)
	return await call_next(request)
	```

	### 📈 Recommended API Security Architecture

	```
	┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
	│ Frontend │ │ Rate Limiter │ │ Backend API │
	│ │ │ │ │ │
	│ • API Key │◄──►│ • IP Limiting │◄──►│ • Input Valid. │
	│ • CORS Headers │ │ • User Quotas │ │ • Auth Checks │
	│ • Request Valid.│ │ • DoS Protection│ │ • Audit Logs │
	└─────────────────┘ └─────────────────┘ └─────────────────┘
	```

	Benefits:

	- Layer 1: Frontend validates requests before sending
	- Layer 2: Rate limiter prevents abuse and DoS attacks
	- Layer 3: Backend performs final validation and authorization

	### 🔒 Security Implementation Checklist

	#### Before Production Deployment:

	CRITICAL (Must Do):

	- [ ] Generate API Key: Create strong API key for endpoint authentication
	- [ ] Enable Rate Limiting: Implement per-IP request limits
	- [ ] Add Security Headers: X-Frame-Options, CSP, X-Content-Type-Options
	- [ ] Input Validation: Sanitize all user inputs (query, ranking)
	- [ ] CORS Configuration: Restrict origins to known domains only
	- [ ] Environment Security: Never commit API keys, use secure .env
	- [ ] HTTPS Only: Force TLS in production (no HTTP)

	HIGH Priority:

	- [ ] Audit Logging: Log all API requests with IP and timestamp
	- [ ] Request Size Limits: Prevent large payload attacks
	- [ ] Error Handling: Don't expose stack traces or internal details
	- [ ] Session Security: HTTP-only, secure, SameSite cookies
	- [ ] API Documentation: Document authentication requirements

	MEDIUM Priority:

	- [ ] User Authentication: Consider adding user accounts for access control
	- [ ] Request Timeout: Prevent long-running request abuse
	- [ ] Content Validation: Verify response content types
	- [ ] Monitoring: Set up alerts for unusual API usage patterns
	- [ ] Backup Strategy: Secure backup of environment variables

	#### Security Testing Commands:

	Test API Authentication:

	```bash
	# Should fail without API key
	curl "http://localhost:7860/fetch_results?query=test&ranking=hybrid"

	# Should succeed with API key
	curl -H "X-API-Key: your_api_key" "http://localhost:7860/fetch_results?query=test&ranking=hybrid"
	```

	Test Rate Limiting:

	```bash
	# Run multiple requests to trigger rate limit
	for i in {1..15}; do
	curl -H "X-API-Key: your_api_key" "http://localhost:7860/fetch_results?query=test$i&ranking=hybrid"
	echo "Request $i"
	done
	```

	Test Input Validation:

	```bash
	# Should reject invalid/malicious inputs
	curl -H "X-API-Key: your_api_key" "http://localhost:7860/fetch_results?query=<script>alert('xss')</script>&ranking=invalid"
	```

	Test Security Headers:

	```bash
	# Check security headers in response
	curl -I "http://localhost:7860/"
	# Should see: X-Frame-Options, X-Content-Type-Options, etc.
	```

	#### Security Monitoring:

	Log Analysis Queries:

	```bash
	# Monitor API usage patterns
	grep "API_ACCESS" /var/log/colpali.log \| tail -100

	# Detect potential abuse
	grep "RATE_LIMIT_EXCEEDED" /var/log/colpali.log

	# Check authentication failures
	grep "UNAUTHORIZED" /var/log/colpali.log
	```

	Alerting Setup:

	- Rate Limit Violations: Alert when >50 requests/minute from single IP
	- Authentication Failures: Alert on repeated unauthorized attempts
	- Unusual Queries: Alert on suspicious query patterns or injection attempts
	- Resource Usage: Alert on high CPU/memory usage (potential DoS)

	## 🧪 Models Used

	- ColPali v1.2: Visual document understanding
	- ColPaliGemma 3B: Base visual-language model
	- Google Gemini 2.0: AI chat and question answering

	## 🔧 Configuration Options

	### Environment Variables

	\| Variable \| Required \| Description \| Security Impact \|
	\| -------------------------- \| -------- \| ------------------------------------------- \| ----------------------------------- \|
	\| `VESPA_APP_TOKEN_URL` \| Yes\* \| Vespa application URL (token auth) \| HIGH - Remote access \|
	\| `VESPA_CLOUD_SECRET_TOKEN` \| Yes\* \| Vespa secret token \| CRITICAL - Full database access \|
	\| `USE_MTLS` \| No \| Use mTLS instead of token auth \| MEDIUM - Auth method \|
	\| `VESPA_APP_MTLS_URL` \| Yes\\ \| Vespa application URL (mTLS) \| HIGH - Remote access \|
	\| `VESPA_CLOUD_MTLS_KEY` \| Yes\\ \| mTLS private key \| CRITICAL - TLS credentials \|
	\| `VESPA_CLOUD_MTLS_CERT` \| Yes\\ \| mTLS certificate \| HIGH - TLS credentials \|
	\| `GEMINI_API_KEY` \| No \| Google Gemini API key \| HIGH - AI access/costs \|
	\| `LOG_LEVEL` \| No \| Logging level (DEBUG, INFO, WARNING, ERROR) \| LOW - Debug info \|
	\| `HOT_RELOAD` \| No \| Enable hot reload in development \| LOW - Dev convenience \|

	#### 🔒 Security-Related Environment Variables (Recommended)

	\| Variable \| Required \| Description \| Default \|
	\| -------------------------- \| --------- \| ------------------------------------ \| ------- \|
	\| `COLPALI_API_KEY` \| YES\* \| API key for endpoint authentication \| None \|
	\| `ALLOWED_ORIGINS` \| YES\* \| Comma-separated allowed CORS origins \| None \|
	\| `RATE_LIMIT_REQUESTS` \| No \| Max requests per minute per IP \| `10` \|
	\| `RATE_LIMIT_WINDOW` \| No \| Rate limit window in seconds \| `60` \|
	\| `MAX_QUERY_LENGTH` \| No \| Maximum query string length \| `500` \|
	\| `ENABLE_AUDIT_LOGGING` \| No \| Log all API requests for security \| `false` \|
	\| `SECURITY_HEADERS_ENABLED` \| No \| Enable security headers \| `true` \|
	\| `CSRF_SECRET` \| YES\* \| Secret for CSRF token generation \| None \|

	Example Security-Enhanced `.env`:

	```bash
	# Existing configuration
	VESPA_APP_TOKEN_URL=https://your-app.vespa-cloud.com
	VESPA_CLOUD_SECRET_TOKEN=your_vespa_secret_token
	GEMINI_API_KEY=your_gemini_api_key

	# NEW: Security configuration
	COLPALI_API_KEY=your_strong_random_api_key_here
	ALLOWED_ORIGINS=http://localhost:3000,https://yourdomain.com
	RATE_LIMIT_REQUESTS=10
	RATE_LIMIT_WINDOW=60
	MAX_QUERY_LENGTH=500
	ENABLE_AUDIT_LOGGING=true
	SECURITY_HEADERS_ENABLED=true
	CSRF_SECRET=your_random_csrf_secret_here

	# Development vs Production
	NODE_ENV=production # Enable secure cookies
	LOG_LEVEL=INFO # Don't expose debug info in production
	```

	\*Required for token authentication
	\\Required for mTLS authentication
	\\\*Required for production security

	## 🚨 Troubleshooting

	### LOCAL Processing Issues

	ColPali model fails to load:

	```bash
	# Check GPU memory
	nvidia-smi # For NVIDIA GPUs
	# or
	system_profiler SPDisplaysDataType # For Apple Silicon

	# Clear model cache if corrupted
	rm -rf ~/.cache/huggingface/hub/models--vidore--colpali-v1.2
	```

	Out of memory errors:

	- Reduce batch size in `feed_vespa.py` (try `batch_size=1`)
	- Close other applications to free RAM/VRAM
	- Use CPU processing if GPU memory insufficient: `CUDA_VISIBLE_DEVICES="" python main.py`

	Slow processing on CPU:

	- Expected behavior - ColPali requires significant computation
	- Consider upgrading to GPU or Apple Silicon for 5-10x speedup
	- Process documents overnight for large collections

	### REMOTE Processing Issues

	Connection to Vespa fails:

	- Verify your Vespa URL and credentials in `.env`
	- Check if the Vespa application is deployed and running
	- Ensure network connectivity: `ping your-app.vespa-cloud.com`
	- Validate authentication tokens haven't expired

	Document upload fails:

	- Check Vespa Cloud storage quota and billing
	- Verify embedding format matches Vespa schema
	- Ensure stable internet connection for large uploads

	Search returns no results:

	- Confirm documents were successfully uploaded to Vespa
	- Check if embeddings were properly indexed
	- Verify query processing isn't failing locally

	### MIXED (Local + Remote) Issues

	Chat features don't work:

	- LOCAL: Verify document images are being generated locally
	- REMOTE: Check `GEMINI_API_KEY` is set correctly
	- REMOTE: Verify Gemini API quota and billing
	- NETWORK: Ensure images can be sent to Gemini API

	Similarity maps missing:

	- LOCAL: Confirm ColPali model loaded successfully
	- LOCAL: Check if similarity map generation completed
	- REMOTE: Verify Vespa returned similarity data
	- BROWSER: Clear browser cache for static files

	### Performance Tips

	LOCAL Optimization:

	- Use GPU acceleration for 5-10x faster model inference
	- Optimize batch sizes based on available memory
	- Use SSD storage for faster model loading
	- Consider quantized models for lower memory usage

	REMOTE Optimization:

	- Use Vespa's HNSW indexing for faster search
	- Optimize embedding dimensions vs accuracy tradeoff
	- Enable compression for faster network transfer
	- Use multiple Vespa instances for high availability

	NETWORK Optimization:

	- Process documents in batches to reduce upload overhead
	- Use compression for embedding transfer
	- Consider regional Vespa deployment for lower latency

	## 📄 License

	Apache-2.0

	## 🤝 Contributing

	1. Fork the repository
	2. Create a feature branch
	3. Make your changes
	4. Run tests and linting
	5. Submit a pull request

	## 📞 Support

	For issues and questions:

	- Check the troubleshooting section
	- Review Vespa and ColPali documentation
	- Open an issue on the repository