vk98's picture
Initial deployment of ColPali Visual Retrieval backend
a54266b
# ColPali 🀝 Vespa - Visual Retrieval System
A powerful visual document retrieval system that combines **ColPali** (Contextual Late Interaction with Patch-level Information) with **Vespa** for scalable, intelligent document search and question-answering.
## 🌟 Features
### πŸ” **Visual Document Search**
- **Multi-modal retrieval**: Search through PDF documents using natural language queries
- **Visual understanding**: ColPali model processes document images and text simultaneously
- **Token-level similarity maps**: Visualize exactly which parts of documents match your query
- **Multiple ranking algorithms**: Choose between hybrid, semantic, and other ranking methods
### 🧠 **AI-Powered Chat**
- **Intelligent Q&A**: Ask questions about retrieved documents using Google Gemini 2.0
- **Context-aware responses**: AI analyzes document images to provide accurate answers
- **Real-time streaming**: Get responses as they're generated
### ⚑ **Scalable Infrastructure**
- **Vespa integration**: Enterprise-grade search platform for large document collections
- **Real-time processing**: Instant search results and similarity map generation
- **Cloud-ready**: Supports Vespa Cloud deployment with secure authentication
## πŸ—οΈ Architecture
```
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Frontend β”‚ β”‚ Backend β”‚ β”‚ Vespa Cloud β”‚
β”‚ (Browser) β”‚ β”‚ (Your Local β”‚ β”‚ (Remote) β”‚
β”‚ β”‚ β”‚ Computer) β”‚ β”‚ β”‚
β”‚ β€’ Search UI │◄──►│ β€’ ColPali Model │◄──►│ β€’ Document Storeβ”‚
β”‚ β€’ Similarity β”‚ β”‚ β€’ Query Proc. β”‚ β”‚ β€’ Vector Search β”‚
β”‚ Maps β”‚ β”‚ β€’ Sim Map Gen. β”‚ β”‚ β€’ Ranking β”‚
β”‚ β€’ Chat Interfaceβ”‚ β”‚ β€’ Gemini Int. β”‚ β”‚ β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
↑ ↑ ↑
Web Browser LOCAL AI REMOTE Storage
```
### 🏠 **LOCAL Processing (Your Computer)**
**All AI model inference happens on YOUR local machine:**
- **ColPali Model**: Runs locally on your GPU/CPU (~7GB model)
- **Document Processing**: PDF β†’ Images β†’ Embeddings (local)
- **Query Processing**: Text β†’ Embeddings (local)
- **Similarity Maps**: Visual attention generation (local)
- **Gemini Chat**: Processes retrieved images locally
**Device Detection:**
```python
device = get_torch_device("auto") # Detects: CUDA, MPS (Apple), or CPU
print(f"Using device: {device}") # Shows YOUR hardware
```
### ☁️ **REMOTE Processing (Vespa Cloud)**
**Only storage and search index operations happen remotely:**
- **Document Storage**: Stores processed embeddings (not raw models)
- **Vector Search**: Fast similarity search across document collection
- **Query Routing**: Handles search requests and ranking
- **Metadata Storage**: Document titles, URLs, page numbers
### πŸ”„ **Complete Data Flow**
#### **Document Upload Process:**
1. **LOCAL**: Your computer downloads PDF from URL
2. **LOCAL**: ColPali converts PDF pages to images
3. **LOCAL**: ColPali generates visual embeddings (1024 patches Γ— 128 dims)
4. **LOCAL**: Embeddings converted to binary format for efficiency
5. **REMOTE**: Binary embeddings uploaded to Vespa Cloud
6. **REMOTE**: Vespa indexes embeddings for fast search
#### **Search Query Process:**
1. **LOCAL**: You enter search query in browser
2. **LOCAL**: ColPali processes query β†’ generates query embeddings
3. **REMOTE**: Query embeddings sent to Vespa Cloud
4. **REMOTE**: Vespa searches document index, returns matches
5. **LOCAL**: ColPali generates similarity maps for results
6. **BROWSER**: Results displayed with visual attention maps
#### **AI Chat Process:**
1. **LOCAL**: Retrieved document images processed by your machine
2. **REMOTE**: Images + query sent to Google Gemini API
3. **REMOTE**: Gemini generates response based on visual content
4. **BROWSER**: Streaming response displayed in real-time
### Core Components
- **ColPali Model**: Visual-language model for document understanding (LOCAL)
- **Vespa Search**: Distributed search and storage engine (REMOTE)
- **FastHTML Frontend**: Modern, responsive web interface (BROWSER)
- **Gemini Integration**: AI-powered question answering (REMOTE API)
- **Similarity Map Generator**: Visual attention visualization (LOCAL)
## πŸ’» **System Requirements**
### **LOCAL Machine Requirements (For AI Processing)**
**Minimum:**
- **CPU**: Modern multi-core processor (Intel/AMD/Apple Silicon)
- **RAM**: 8GB+ (16GB recommended)
- **Storage**: 10GB free space (for model cache)
- **Python**: 3.10+ (< 3.13)
**Recommended:**
- **GPU**: NVIDIA GPU with 8GB+ VRAM (RTX 3070/4060 or better)
- **Apple**: M1/M2/M3 Mac (uses Metal Performance Shaders)
- **RAM**: 16GB+ for smoother processing
- **Storage**: SSD for faster model loading
**Performance Examples:**
- **RTX 4090**: ~1-2 seconds per query
- **RTX 3070**: ~3-5 seconds per query
- **Apple M2**: ~4-6 seconds per query
- **CPU Only**: ~15-30 seconds per query
### **REMOTE Requirements (Vespa Cloud)**
**What you need:**
- **Vespa Cloud account** (handles all remote processing)
- **Internet connection** (for uploading embeddings and search queries)
- **Authentication tokens** (provided by Vespa Cloud)
**What Vespa Cloud provides:**
- **Scalable storage** for any number of documents
- **Sub-second search** across millions of embeddings
- **High availability** with automatic failover
- **Global CDN** for fast access worldwide
## πŸ’° **Cost Breakdown**
### **FREE Components**
- **ColPali Model**: Open source, runs locally (no per-query costs)
- **Python Application**: MIT/Apache licensed, completely free
- **Local Processing**: Uses your own hardware (no cloud AI fees)
### **PAID Components**
- **Vespa Cloud**: Pay for storage and search operations
- ~$0.001 per 1000 searches
- ~$0.10 per GB storage per month
- **Google Gemini API**: Optional, for chat features only
- ~$0.01 per 1000 image tokens
- Only used when you ask questions about documents
### **Cost Examples (Monthly)**
- **Personal Use** (100 documents, 1000 searches): ~$5-10/month
- **Small Business** (1000 documents, 10k searches): ~$20-50/month
- **Enterprise** (10k+ documents, 100k+ searches): $200+/month
**πŸ’‘ Cost Optimization Tips:**
- Use local Vespa installation to avoid cloud costs
- Disable Gemini chat if not needed (saves API costs)
- Process documents in batches to minimize upload time
## πŸš€ Quick Start
### Prerequisites
- Python 3.10+ (< 3.13)
- **8GB+ RAM** for ColPali model
- **Vespa Cloud account** or local Vespa installation
- **Google Gemini API key** (optional, for chat features)
- **GPU recommended** but not required
### 1. Installation
```bash
# Clone the repository
git clone <repository-url>
cd colpali-vespa-visual-retrieval
# Install dependencies
pip install -e .
# For development
pip install -e ".[dev]"
# For document feeding capabilities
pip install -e ".[feed]"
```
### 2. Environment Configuration
Create a `.env` file with your configuration:
```bash
# Vespa Configuration
VESPA_APP_TOKEN_URL=https://your-app.vespa-cloud.com
VESPA_CLOUD_SECRET_TOKEN=your_secret_token
# Alternative: mTLS Authentication
USE_MTLS=false
VESPA_APP_MTLS_URL=https://your-app.vespa-cloud.com
VESPA_CLOUD_MTLS_KEY="-----BEGIN PRIVATE KEY-----..."
VESPA_CLOUD_MTLS_CERT="-----BEGIN CERTIFICATE-----..."
# Optional: Gemini AI (for chat features)
GEMINI_API_KEY=your_gemini_api_key
# Optional: Logging
LOG_LEVEL=INFO
HOT_RELOAD=false
```
### 3. Deploy Vespa Application
```bash
# Deploy the Vespa schema and configuration
python deploy_vespa_app.py \
--tenant_name your_tenant \
--vespa_application_name colpalidemo \
--token_id_write colpalidemo_write \
--token_id_read colpalidemo_read
```
### 4. Run the Application
```bash
python main.py
```
The application will be available at `http://localhost:7860`
## πŸ“š Document Management
### Uploading Documents
Use the feeding script to process and upload PDF documents:
```bash
python feed_vespa.py \
--application_name colpalidemo \
--vespa_schema_name pdf_page
```
**Document Processing Pipeline (LOCAL β†’ REMOTE):**
1. **PDF Download** (LOCAL): Your computer downloads PDFs from URLs
2. **PDF Conversion** (LOCAL): PDFs converted to images (one per page)
3. **ColPali Processing** (LOCAL): Each page processed by ColPali model on YOUR GPU/CPU
4. **Embedding Generation** (LOCAL): Visual embeddings created (1024 patches Γ— 128 dimensions)
5. **Binary Encoding** (LOCAL): Embeddings converted to efficient binary format
6. **Vespa Upload** (REMOTE): Binary embeddings uploaded to Vespa Cloud
7. **Search Indexing** (REMOTE): Vespa indexes embeddings for fast retrieval
**⚠️ Important Notes:**
- **Processing Time**: Expect 5-30 seconds per page depending on your hardware
- **Network Usage**: Only final embeddings uploaded (~1KB per page vs ~1MB original)
- **Privacy**: Original PDFs and images stay on your local machine
- **Storage**: Raw images cached locally for similarity map generation
### Supported Operations
- βœ… **Upload Documents**: Add new PDFs to the system
- βœ… **Search Documents**: Query existing documents
- βœ… **View Documents**: Browse stored documents
- ❌ **Remove Documents**: _Not currently implemented_
- ❌ **Update Documents**: _Not currently implemented_
## πŸ” Authentication & Security
### πŸ›‘οΈ **Current Security Implementation**
#### **SECURE Components:**
**Vespa Authentication (REMOTE)**
- **Token Authentication**: Bearer tokens for Vespa Cloud API access
- **mTLS Certificates**: Mutual TLS for enterprise security
- **Encrypted Communication**: HTTPS/TLS for all Vespa connections
**API Key Management (LOCAL)**
- **Environment Variables**: Sensitive keys stored in `.env` files
- **API Key Rotation**: Google Gemini supports key rotation
- **Local Storage**: Keys never transmitted except to authorized APIs
#### **LIMITED Security Components:**
**Session Management**
```python
# Basic UUID session tracking (FastHTML)
session["session_id"] = str(uuid.uuid4())
# HTTP-only cookies (Next.js)
cookieStore.set(SESSION_KEY, newSessionId, {
httpOnly: true,
secure: process.env.NODE_ENV === "production",
sameSite: "lax",
maxAge: 60 * 60 * 24 * 30, // 30 days
});
```
**Basic Request Validation**
```python
# HTMX request validation
if "hx-request" not in request.headers:
return RedirectResponse("/search")
# Parameter validation
if not query:
return NextResponse.json({ error: "Query is required" }, { status: 400 });
```
### ⚠️ **Security Limitations & Risks**
#### **MISSING Security Features:**
**❌ No API Authentication**
- Local API endpoints are **completely open**
- No rate limiting or abuse protection
- No user authentication or authorization
- Anyone can access `/fetch_results`, `/get_sim_map` endpoints
**❌ No Input Sanitization**
```python
# Raw user input passed directly to models
query = searchParams.get("query") # No validation/sanitization
ranking = searchParams.get("ranking") # No input filtering
```
**❌ No Security Headers**
- No CORS configuration
- No Content Security Policy (CSP)
- No X-Frame-Options protection
- No X-Content-Type-Options validation
**❌ No Rate Limiting**
- Unlimited API requests
- No protection against DoS attacks
- No query throttling or user limits
**❌ No CSRF Protection**
- No token validation for state-changing operations
- Cross-site request forgery possible
### 🎯 **Security Recommendations**
#### **IMMEDIATE (High Priority)**
**1. Add API Authentication**
```typescript
// middleware.ts - Add API key validation
export function middleware(request: NextRequest) {
const apiKey = request.headers.get("X-API-Key");
if (!apiKey || apiKey !== process.env.COLPALI_API_KEY) {
return new Response("Unauthorized", { status: 401 });
}
}
```
**2. Implement Rate Limiting**
```typescript
// Use next-rate-limit or similar
import rateLimit from "@/lib/rate-limit";
const limiter = rateLimit({
interval: 60 * 1000, // 1 minute
uniqueTokenPerInterval: 500, // Limit each IP to 100 requests per interval
});
await limiter.check(10, getClientIP(request)); // 10 requests per minute
```
**3. Add Security Headers**
```typescript
// next.config.js
const securityHeaders = [
{ key: "X-Frame-Options", value: "DENY" },
{ key: "X-Content-Type-Options", value: "nosniff" },
{ key: "Referrer-Policy", value: "strict-origin-when-cross-origin" },
{
key: "Content-Security-Policy",
value: "default-src 'self'; script-src 'self' 'unsafe-inline'",
},
];
```
**4. Input Validation & Sanitization**
```typescript
import { z } from "zod";
const SearchSchema = z.object({
query: z
.string()
.min(1)
.max(500)
.regex(/^[a-zA-Z0-9\s\.\?\!]*$/),
ranking: z.enum(["hybrid", "colpali", "bm25"]),
});
```
#### **MEDIUM Priority**
**5. CORS Configuration**
```typescript
// Restrict origins to known domains
const corsHeaders = {
"Access-Control-Allow-Origin": "https://yourdomain.com",
"Access-Control-Allow-Methods": "GET, POST, OPTIONS",
"Access-Control-Allow-Headers": "Content-Type, Authorization",
};
```
**6. Request Size Limits**
```typescript
// Limit request payload sizes
export const config = {
api: {
bodyParser: {
sizeLimit: "1mb",
},
},
};
```
**7. Audit Logging**
```python
# Log all API access with IP, timestamp, and queries
logger.info(f"API_ACCESS: {client_ip} - {endpoint} - {query[:100]}")
```
#### **LONG-TERM (Production Ready)**
**8. User Authentication (Optional)**
```typescript
// Add NextAuth.js or similar for user accounts
// Implement role-based access control
// Add document ownership and permissions
```
**9. Network Security**
```bash
# Deploy behind reverse proxy (nginx/cloudflare)
# Enable DDoS protection
# Use Web Application Firewall (WAF)
```
**10. Data Privacy Controls**
```typescript
// Implement data retention policies
// Add user data deletion capabilities
// GDPR compliance features
```
### πŸ”’ **Security Best Practices**
#### **For LOCAL Development:**
- **Never commit API keys** to version control
- **Use strong environment variable names** (avoid `API_KEY`)
- **Rotate API keys regularly** (monthly)
- **Enable firewall** on development machines
- **Use HTTPS even locally** for production testing
#### **For PRODUCTION Deployment:**
- **Deploy behind CDN/WAF** (Cloudflare, AWS Shield)
- **Enable rate limiting** at infrastructure level
- **Use container security scanning**
- **Implement monitoring and alerting**
- **Regular security audits and penetration testing**
#### **For REMOTE Services:**
- **Vespa Cloud**: Follows enterprise security standards
- **Gemini API**: Google-managed security and compliance
- **Environment Isolation**: Separate dev/staging/prod credentials
### 🚨 **Current Risk Level: MEDIUM**
**Suitable for:**
- βœ… **Personal projects and demos**
- βœ… **Internal company tools** (behind firewall)
- βœ… **Research and development** environments
**NOT suitable for:**
- ❌ **Public internet deployment**
- ❌ **Customer-facing applications**
- ❌ **Production environments** with sensitive data
- ❌ **Commercial applications** without security hardening
## 🎯 Usage Guide
### Basic Search
1. Navigate to the homepage
2. Enter your search query in natural language
3. Select ranking method (hybrid, semantic, etc.)
4. View results with similarity maps
### Similarity Maps
- Click on token buttons to see which parts of documents match specific query terms
- Visual heatmaps show attention patterns
- Reset button returns to original document view
### AI Chat
- Ask questions about retrieved documents
- Chat responses are based on document content
- Streaming responses for real-time interaction
### Search Rankings
- **Hybrid**: Combines multiple ranking signals
- **Semantic**: Pure semantic similarity
- **BM25**: Traditional text-based ranking
- **ColPali**: Visual-first ranking
## πŸ› οΈ Development
### Project Structure
```
β”œβ”€β”€ main.py # Application entry point
β”œβ”€β”€ backend/
β”‚ β”œβ”€β”€ colpali.py # ColPali model integration
β”‚ β”œβ”€β”€ vespa_app.py # Vespa client and queries
β”‚ └── modelmanager.py # Model management utilities
β”œβ”€β”€ frontend/
β”‚ β”œβ”€β”€ app.py # UI components
β”‚ └── layout.py # Layout templates
β”œβ”€β”€ feed_vespa.py # Document upload script
β”œβ”€β”€ deploy_vespa_app.py # Vespa deployment script
β”œβ”€β”€ colpali-with-snippets/ # Vespa schema definitions
└── static/ # Static assets and generated files
```
### Running in Development
```bash
# Enable hot reload
export HOT_RELOAD=true
python main.py
# Or set in .env
echo "HOT_RELOAD=true" >> .env
```
### Code Quality
```bash
# Format code
ruff format .
# Lint code
ruff check .
```
## πŸ“Š API Endpoints
### **Current API Routes (⚠️ UNSECURED)**
| Endpoint | Method | Description | Security Status |
| ---------------- | ------ | ----------------------- | ---------------- |
| `/` | GET | Homepage | βœ… Public (safe) |
| `/search` | GET | Search interface | βœ… Public (safe) |
| `/fetch_results` | GET | Fetch search results | ⚠️ **OPEN API** |
| `/get_sim_map` | GET | Get similarity maps | ⚠️ **OPEN API** |
| `/get-message` | GET | Chat with AI (SSE) | ⚠️ **OPEN API** |
| `/full_image` | GET | Get full document image | ⚠️ **OPEN API** |
| `/suggestions` | GET | Query autocomplete | ⚠️ **OPEN API** |
| `/static/*` | GET | Static file serving | βœ… Public (safe) |
### **Security Analysis by Endpoint**
#### **πŸ”’ SECURE Endpoints**
- **`/`** and **`/search`**: Static HTML pages, no sensitive data
- **`/static/*`**: Public assets (CSS, JS, images)
#### **⚠️ UNSECURED Endpoints (Risk)**
**`/fetch_results`** - **HIGH RISK**
```bash
# Anyone can perform unlimited searches
curl "http://localhost:7860/fetch_results?query=secret&ranking=hybrid"
```
- **Risks**: Resource abuse, server overload, competitive intelligence gathering
- **Exposes**: Search capabilities, document metadata, processing times
**`/get_sim_map`** - **MEDIUM RISK**
```bash
# Access similarity maps without authentication
curl "http://localhost:7860/get_sim_map?query_id=123&idx=0&token=word&token_idx=5"
```
- **Risks**: Unauthorized access to visual analysis
- **Exposes**: Document visual patterns, query insights
**`/get-message`** - **HIGH RISK**
```bash
# Trigger AI processing without limits
curl "http://localhost:7860/get-message?query_id=123&query=question&doc_ids=doc1,doc2"
```
- **Risks**: Gemini API abuse, cost exploitation, resource exhaustion
- **Exposes**: AI-generated insights, document content analysis
**`/full_image`** - **HIGH RISK**
```bash
# Download any document image
curl "http://localhost:7860/full_image?doc_id=any_document_id"
```
- **Risks**: Unauthorized document access, data leakage
- **Exposes**: Full document images, potentially sensitive content
### **Immediate Security Fixes**
#### **1. Add API Key Authentication**
```python
# Python FastHTML middleware
@app.middleware("http")
async def verify_api_key(request, call_next):
if request.url.path.startswith("/fetch_results"):
api_key = request.headers.get("X-API-Key")
if not api_key or api_key != os.getenv("COLPALI_API_KEY"):
return JSONResponse({"error": "Unauthorized"}, status_code=401)
return await call_next(request)
```
#### **2. Implement Rate Limiting**
```python
from slowapi import Limiter, _rate_limit_exceeded_handler
from slowapi.util import get_remote_address
limiter = Limiter(key_func=get_remote_address)
@rt("/fetch_results")
@limiter.limit("10/minute") # 10 requests per minute per IP
async def get_results(request, query: str, ranking: str):
# ... existing code
```
#### **3. Input Validation**
```python
from pydantic import BaseModel, validator
class SearchRequest(BaseModel):
query: str
ranking: str
@validator('query')
def query_must_be_safe(cls, v):
if len(v) > 500:
raise ValueError('Query too long')
# Add sanitization logic
return v.strip()
```
#### **4. Request Origin Validation**
```python
ALLOWED_ORIGINS = ["http://localhost:3000", "https://yourdomain.com"]
@app.middleware("http")
async def cors_middleware(request, call_next):
origin = request.headers.get("origin")
if origin not in ALLOWED_ORIGINS:
return JSONResponse({"error": "Forbidden"}, status_code=403)
return await call_next(request)
```
### **πŸ“ˆ Recommended API Security Architecture**
```
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Frontend β”‚ β”‚ Rate Limiter β”‚ β”‚ Backend API β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ β”‚
β”‚ β€’ API Key │◄──►│ β€’ IP Limiting │◄──►│ β€’ Input Valid. β”‚
β”‚ β€’ CORS Headers β”‚ β”‚ β€’ User Quotas β”‚ β”‚ β€’ Auth Checks β”‚
β”‚ β€’ Request Valid.β”‚ β”‚ β€’ DoS Protectionβ”‚ β”‚ β€’ Audit Logs β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
```
**Benefits:**
- **Layer 1**: Frontend validates requests before sending
- **Layer 2**: Rate limiter prevents abuse and DoS attacks
- **Layer 3**: Backend performs final validation and authorization
### **πŸ”’ Security Implementation Checklist**
#### **Before Production Deployment:**
**CRITICAL (Must Do):**
- [ ] **Generate API Key**: Create strong API key for endpoint authentication
- [ ] **Enable Rate Limiting**: Implement per-IP request limits
- [ ] **Add Security Headers**: X-Frame-Options, CSP, X-Content-Type-Options
- [ ] **Input Validation**: Sanitize all user inputs (query, ranking)
- [ ] **CORS Configuration**: Restrict origins to known domains only
- [ ] **Environment Security**: Never commit API keys, use secure .env
- [ ] **HTTPS Only**: Force TLS in production (no HTTP)
**HIGH Priority:**
- [ ] **Audit Logging**: Log all API requests with IP and timestamp
- [ ] **Request Size Limits**: Prevent large payload attacks
- [ ] **Error Handling**: Don't expose stack traces or internal details
- [ ] **Session Security**: HTTP-only, secure, SameSite cookies
- [ ] **API Documentation**: Document authentication requirements
**MEDIUM Priority:**
- [ ] **User Authentication**: Consider adding user accounts for access control
- [ ] **Request Timeout**: Prevent long-running request abuse
- [ ] **Content Validation**: Verify response content types
- [ ] **Monitoring**: Set up alerts for unusual API usage patterns
- [ ] **Backup Strategy**: Secure backup of environment variables
#### **Security Testing Commands:**
**Test API Authentication:**
```bash
# Should fail without API key
curl "http://localhost:7860/fetch_results?query=test&ranking=hybrid"
# Should succeed with API key
curl -H "X-API-Key: your_api_key" "http://localhost:7860/fetch_results?query=test&ranking=hybrid"
```
**Test Rate Limiting:**
```bash
# Run multiple requests to trigger rate limit
for i in {1..15}; do
curl -H "X-API-Key: your_api_key" "http://localhost:7860/fetch_results?query=test$i&ranking=hybrid"
echo "Request $i"
done
```
**Test Input Validation:**
```bash
# Should reject invalid/malicious inputs
curl -H "X-API-Key: your_api_key" "http://localhost:7860/fetch_results?query=<script>alert('xss')</script>&ranking=invalid"
```
**Test Security Headers:**
```bash
# Check security headers in response
curl -I "http://localhost:7860/"
# Should see: X-Frame-Options, X-Content-Type-Options, etc.
```
#### **Security Monitoring:**
**Log Analysis Queries:**
```bash
# Monitor API usage patterns
grep "API_ACCESS" /var/log/colpali.log | tail -100
# Detect potential abuse
grep "RATE_LIMIT_EXCEEDED" /var/log/colpali.log
# Check authentication failures
grep "UNAUTHORIZED" /var/log/colpali.log
```
**Alerting Setup:**
- **Rate Limit Violations**: Alert when >50 requests/minute from single IP
- **Authentication Failures**: Alert on repeated unauthorized attempts
- **Unusual Queries**: Alert on suspicious query patterns or injection attempts
- **Resource Usage**: Alert on high CPU/memory usage (potential DoS)
## πŸ§ͺ Models Used
- **ColPali v1.2**: Visual document understanding
- **ColPaliGemma 3B**: Base visual-language model
- **Google Gemini 2.0**: AI chat and question answering
## πŸ”§ Configuration Options
### Environment Variables
| Variable | Required | Description | Security Impact |
| -------------------------- | -------- | ------------------------------------------- | ----------------------------------- |
| `VESPA_APP_TOKEN_URL` | Yes\* | Vespa application URL (token auth) | **HIGH** - Remote access |
| `VESPA_CLOUD_SECRET_TOKEN` | Yes\* | Vespa secret token | **CRITICAL** - Full database access |
| `USE_MTLS` | No | Use mTLS instead of token auth | **MEDIUM** - Auth method |
| `VESPA_APP_MTLS_URL` | Yes\*\* | Vespa application URL (mTLS) | **HIGH** - Remote access |
| `VESPA_CLOUD_MTLS_KEY` | Yes\*\* | mTLS private key | **CRITICAL** - TLS credentials |
| `VESPA_CLOUD_MTLS_CERT` | Yes\*\* | mTLS certificate | **HIGH** - TLS credentials |
| `GEMINI_API_KEY` | No | Google Gemini API key | **HIGH** - AI access/costs |
| `LOG_LEVEL` | No | Logging level (DEBUG, INFO, WARNING, ERROR) | **LOW** - Debug info |
| `HOT_RELOAD` | No | Enable hot reload in development | **LOW** - Dev convenience |
#### **πŸ”’ Security-Related Environment Variables (Recommended)**
| Variable | Required | Description | Default |
| -------------------------- | --------- | ------------------------------------ | ------- |
| `COLPALI_API_KEY` | **YES\*** | API key for endpoint authentication | None |
| `ALLOWED_ORIGINS` | **YES\*** | Comma-separated allowed CORS origins | None |
| `RATE_LIMIT_REQUESTS` | No | Max requests per minute per IP | `10` |
| `RATE_LIMIT_WINDOW` | No | Rate limit window in seconds | `60` |
| `MAX_QUERY_LENGTH` | No | Maximum query string length | `500` |
| `ENABLE_AUDIT_LOGGING` | No | Log all API requests for security | `false` |
| `SECURITY_HEADERS_ENABLED` | No | Enable security headers | `true` |
| `CSRF_SECRET` | **YES\*** | Secret for CSRF token generation | None |
**Example Security-Enhanced `.env`:**
```bash
# Existing configuration
VESPA_APP_TOKEN_URL=https://your-app.vespa-cloud.com
VESPA_CLOUD_SECRET_TOKEN=your_vespa_secret_token
GEMINI_API_KEY=your_gemini_api_key
# NEW: Security configuration
COLPALI_API_KEY=your_strong_random_api_key_here
ALLOWED_ORIGINS=http://localhost:3000,https://yourdomain.com
RATE_LIMIT_REQUESTS=10
RATE_LIMIT_WINDOW=60
MAX_QUERY_LENGTH=500
ENABLE_AUDIT_LOGGING=true
SECURITY_HEADERS_ENABLED=true
CSRF_SECRET=your_random_csrf_secret_here
# Development vs Production
NODE_ENV=production # Enable secure cookies
LOG_LEVEL=INFO # Don't expose debug info in production
```
\*Required for token authentication
\*\*Required for mTLS authentication
\*\*\*Required for production security
## 🚨 Troubleshooting
### **LOCAL Processing Issues**
**ColPali model fails to load:**
```bash
# Check GPU memory
nvidia-smi # For NVIDIA GPUs
# or
system_profiler SPDisplaysDataType # For Apple Silicon
# Clear model cache if corrupted
rm -rf ~/.cache/huggingface/hub/models--vidore--colpali-v1.2
```
**Out of memory errors:**
- Reduce batch size in `feed_vespa.py` (try `batch_size=1`)
- Close other applications to free RAM/VRAM
- Use CPU processing if GPU memory insufficient: `CUDA_VISIBLE_DEVICES="" python main.py`
**Slow processing on CPU:**
- Expected behavior - ColPali requires significant computation
- Consider upgrading to GPU or Apple Silicon for 5-10x speedup
- Process documents overnight for large collections
### **REMOTE Processing Issues**
**Connection to Vespa fails:**
- Verify your Vespa URL and credentials in `.env`
- Check if the Vespa application is deployed and running
- Ensure network connectivity: `ping your-app.vespa-cloud.com`
- Validate authentication tokens haven't expired
**Document upload fails:**
- Check Vespa Cloud storage quota and billing
- Verify embedding format matches Vespa schema
- Ensure stable internet connection for large uploads
**Search returns no results:**
- Confirm documents were successfully uploaded to Vespa
- Check if embeddings were properly indexed
- Verify query processing isn't failing locally
### **MIXED (Local + Remote) Issues**
**Chat features don't work:**
- **LOCAL**: Verify document images are being generated locally
- **REMOTE**: Check `GEMINI_API_KEY` is set correctly
- **REMOTE**: Verify Gemini API quota and billing
- **NETWORK**: Ensure images can be sent to Gemini API
**Similarity maps missing:**
- **LOCAL**: Confirm ColPali model loaded successfully
- **LOCAL**: Check if similarity map generation completed
- **REMOTE**: Verify Vespa returned similarity data
- **BROWSER**: Clear browser cache for static files
### Performance Tips
**LOCAL Optimization:**
- Use GPU acceleration for 5-10x faster model inference
- Optimize batch sizes based on available memory
- Use SSD storage for faster model loading
- Consider quantized models for lower memory usage
**REMOTE Optimization:**
- Use Vespa's HNSW indexing for faster search
- Optimize embedding dimensions vs accuracy tradeoff
- Enable compression for faster network transfer
- Use multiple Vespa instances for high availability
**NETWORK Optimization:**
- Process documents in batches to reduce upload overhead
- Use compression for embedding transfer
- Consider regional Vespa deployment for lower latency
## πŸ“„ License
Apache-2.0
## 🀝 Contributing
1. Fork the repository
2. Create a feature branch
3. Make your changes
4. Run tests and linting
5. Submit a pull request
## πŸ“ž Support
For issues and questions:
- Check the troubleshooting section
- Review Vespa and ColPali documentation
- Open an issue on the repository