Spaces:
Build error
Build error
# ColPali π€ Vespa - Visual Retrieval System | |
A powerful visual document retrieval system that combines **ColPali** (Contextual Late Interaction with Patch-level Information) with **Vespa** for scalable, intelligent document search and question-answering. | |
## π Features | |
### π **Visual Document Search** | |
- **Multi-modal retrieval**: Search through PDF documents using natural language queries | |
- **Visual understanding**: ColPali model processes document images and text simultaneously | |
- **Token-level similarity maps**: Visualize exactly which parts of documents match your query | |
- **Multiple ranking algorithms**: Choose between hybrid, semantic, and other ranking methods | |
### π§ **AI-Powered Chat** | |
- **Intelligent Q&A**: Ask questions about retrieved documents using Google Gemini 2.0 | |
- **Context-aware responses**: AI analyzes document images to provide accurate answers | |
- **Real-time streaming**: Get responses as they're generated | |
### β‘ **Scalable Infrastructure** | |
- **Vespa integration**: Enterprise-grade search platform for large document collections | |
- **Real-time processing**: Instant search results and similarity map generation | |
- **Cloud-ready**: Supports Vespa Cloud deployment with secure authentication | |
## ποΈ Architecture | |
``` | |
βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ | |
β Frontend β β Backend β β Vespa Cloud β | |
β (Browser) β β (Your Local β β (Remote) β | |
β β β Computer) β β β | |
β β’ Search UI βββββΊβ β’ ColPali Model βββββΊβ β’ Document Storeβ | |
β β’ Similarity β β β’ Query Proc. β β β’ Vector Search β | |
β Maps β β β’ Sim Map Gen. β β β’ Ranking β | |
β β’ Chat Interfaceβ β β’ Gemini Int. β β β | |
βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ | |
β β β | |
Web Browser LOCAL AI REMOTE Storage | |
``` | |
### π **LOCAL Processing (Your Computer)** | |
**All AI model inference happens on YOUR local machine:** | |
- **ColPali Model**: Runs locally on your GPU/CPU (~7GB model) | |
- **Document Processing**: PDF β Images β Embeddings (local) | |
- **Query Processing**: Text β Embeddings (local) | |
- **Similarity Maps**: Visual attention generation (local) | |
- **Gemini Chat**: Processes retrieved images locally | |
**Device Detection:** | |
```python | |
device = get_torch_device("auto") # Detects: CUDA, MPS (Apple), or CPU | |
print(f"Using device: {device}") # Shows YOUR hardware | |
``` | |
### βοΈ **REMOTE Processing (Vespa Cloud)** | |
**Only storage and search index operations happen remotely:** | |
- **Document Storage**: Stores processed embeddings (not raw models) | |
- **Vector Search**: Fast similarity search across document collection | |
- **Query Routing**: Handles search requests and ranking | |
- **Metadata Storage**: Document titles, URLs, page numbers | |
### π **Complete Data Flow** | |
#### **Document Upload Process:** | |
1. **LOCAL**: Your computer downloads PDF from URL | |
2. **LOCAL**: ColPali converts PDF pages to images | |
3. **LOCAL**: ColPali generates visual embeddings (1024 patches Γ 128 dims) | |
4. **LOCAL**: Embeddings converted to binary format for efficiency | |
5. **REMOTE**: Binary embeddings uploaded to Vespa Cloud | |
6. **REMOTE**: Vespa indexes embeddings for fast search | |
#### **Search Query Process:** | |
1. **LOCAL**: You enter search query in browser | |
2. **LOCAL**: ColPali processes query β generates query embeddings | |
3. **REMOTE**: Query embeddings sent to Vespa Cloud | |
4. **REMOTE**: Vespa searches document index, returns matches | |
5. **LOCAL**: ColPali generates similarity maps for results | |
6. **BROWSER**: Results displayed with visual attention maps | |
#### **AI Chat Process:** | |
1. **LOCAL**: Retrieved document images processed by your machine | |
2. **REMOTE**: Images + query sent to Google Gemini API | |
3. **REMOTE**: Gemini generates response based on visual content | |
4. **BROWSER**: Streaming response displayed in real-time | |
### Core Components | |
- **ColPali Model**: Visual-language model for document understanding (LOCAL) | |
- **Vespa Search**: Distributed search and storage engine (REMOTE) | |
- **FastHTML Frontend**: Modern, responsive web interface (BROWSER) | |
- **Gemini Integration**: AI-powered question answering (REMOTE API) | |
- **Similarity Map Generator**: Visual attention visualization (LOCAL) | |
## π» **System Requirements** | |
### **LOCAL Machine Requirements (For AI Processing)** | |
**Minimum:** | |
- **CPU**: Modern multi-core processor (Intel/AMD/Apple Silicon) | |
- **RAM**: 8GB+ (16GB recommended) | |
- **Storage**: 10GB free space (for model cache) | |
- **Python**: 3.10+ (< 3.13) | |
**Recommended:** | |
- **GPU**: NVIDIA GPU with 8GB+ VRAM (RTX 3070/4060 or better) | |
- **Apple**: M1/M2/M3 Mac (uses Metal Performance Shaders) | |
- **RAM**: 16GB+ for smoother processing | |
- **Storage**: SSD for faster model loading | |
**Performance Examples:** | |
- **RTX 4090**: ~1-2 seconds per query | |
- **RTX 3070**: ~3-5 seconds per query | |
- **Apple M2**: ~4-6 seconds per query | |
- **CPU Only**: ~15-30 seconds per query | |
### **REMOTE Requirements (Vespa Cloud)** | |
**What you need:** | |
- **Vespa Cloud account** (handles all remote processing) | |
- **Internet connection** (for uploading embeddings and search queries) | |
- **Authentication tokens** (provided by Vespa Cloud) | |
**What Vespa Cloud provides:** | |
- **Scalable storage** for any number of documents | |
- **Sub-second search** across millions of embeddings | |
- **High availability** with automatic failover | |
- **Global CDN** for fast access worldwide | |
## π° **Cost Breakdown** | |
### **FREE Components** | |
- **ColPali Model**: Open source, runs locally (no per-query costs) | |
- **Python Application**: MIT/Apache licensed, completely free | |
- **Local Processing**: Uses your own hardware (no cloud AI fees) | |
### **PAID Components** | |
- **Vespa Cloud**: Pay for storage and search operations | |
- ~$0.001 per 1000 searches | |
- ~$0.10 per GB storage per month | |
- **Google Gemini API**: Optional, for chat features only | |
- ~$0.01 per 1000 image tokens | |
- Only used when you ask questions about documents | |
### **Cost Examples (Monthly)** | |
- **Personal Use** (100 documents, 1000 searches): ~$5-10/month | |
- **Small Business** (1000 documents, 10k searches): ~$20-50/month | |
- **Enterprise** (10k+ documents, 100k+ searches): $200+/month | |
**π‘ Cost Optimization Tips:** | |
- Use local Vespa installation to avoid cloud costs | |
- Disable Gemini chat if not needed (saves API costs) | |
- Process documents in batches to minimize upload time | |
## π Quick Start | |
### Prerequisites | |
- Python 3.10+ (< 3.13) | |
- **8GB+ RAM** for ColPali model | |
- **Vespa Cloud account** or local Vespa installation | |
- **Google Gemini API key** (optional, for chat features) | |
- **GPU recommended** but not required | |
### 1. Installation | |
```bash | |
# Clone the repository | |
git clone <repository-url> | |
cd colpali-vespa-visual-retrieval | |
# Install dependencies | |
pip install -e . | |
# For development | |
pip install -e ".[dev]" | |
# For document feeding capabilities | |
pip install -e ".[feed]" | |
``` | |
### 2. Environment Configuration | |
Create a `.env` file with your configuration: | |
```bash | |
# Vespa Configuration | |
VESPA_APP_TOKEN_URL=https://your-app.vespa-cloud.com | |
VESPA_CLOUD_SECRET_TOKEN=your_secret_token | |
# Alternative: mTLS Authentication | |
USE_MTLS=false | |
VESPA_APP_MTLS_URL=https://your-app.vespa-cloud.com | |
VESPA_CLOUD_MTLS_KEY="-----BEGIN PRIVATE KEY-----..." | |
VESPA_CLOUD_MTLS_CERT="-----BEGIN CERTIFICATE-----..." | |
# Optional: Gemini AI (for chat features) | |
GEMINI_API_KEY=your_gemini_api_key | |
# Optional: Logging | |
LOG_LEVEL=INFO | |
HOT_RELOAD=false | |
``` | |
### 3. Deploy Vespa Application | |
```bash | |
# Deploy the Vespa schema and configuration | |
python deploy_vespa_app.py \ | |
--tenant_name your_tenant \ | |
--vespa_application_name colpalidemo \ | |
--token_id_write colpalidemo_write \ | |
--token_id_read colpalidemo_read | |
``` | |
### 4. Run the Application | |
```bash | |
python main.py | |
``` | |
The application will be available at `http://localhost:7860` | |
## π Document Management | |
### Uploading Documents | |
Use the feeding script to process and upload PDF documents: | |
```bash | |
python feed_vespa.py \ | |
--application_name colpalidemo \ | |
--vespa_schema_name pdf_page | |
``` | |
**Document Processing Pipeline (LOCAL β REMOTE):** | |
1. **PDF Download** (LOCAL): Your computer downloads PDFs from URLs | |
2. **PDF Conversion** (LOCAL): PDFs converted to images (one per page) | |
3. **ColPali Processing** (LOCAL): Each page processed by ColPali model on YOUR GPU/CPU | |
4. **Embedding Generation** (LOCAL): Visual embeddings created (1024 patches Γ 128 dimensions) | |
5. **Binary Encoding** (LOCAL): Embeddings converted to efficient binary format | |
6. **Vespa Upload** (REMOTE): Binary embeddings uploaded to Vespa Cloud | |
7. **Search Indexing** (REMOTE): Vespa indexes embeddings for fast retrieval | |
**β οΈ Important Notes:** | |
- **Processing Time**: Expect 5-30 seconds per page depending on your hardware | |
- **Network Usage**: Only final embeddings uploaded (~1KB per page vs ~1MB original) | |
- **Privacy**: Original PDFs and images stay on your local machine | |
- **Storage**: Raw images cached locally for similarity map generation | |
### Supported Operations | |
- β **Upload Documents**: Add new PDFs to the system | |
- β **Search Documents**: Query existing documents | |
- β **View Documents**: Browse stored documents | |
- β **Remove Documents**: _Not currently implemented_ | |
- β **Update Documents**: _Not currently implemented_ | |
## π Authentication & Security | |
### π‘οΈ **Current Security Implementation** | |
#### **SECURE Components:** | |
**Vespa Authentication (REMOTE)** | |
- **Token Authentication**: Bearer tokens for Vespa Cloud API access | |
- **mTLS Certificates**: Mutual TLS for enterprise security | |
- **Encrypted Communication**: HTTPS/TLS for all Vespa connections | |
**API Key Management (LOCAL)** | |
- **Environment Variables**: Sensitive keys stored in `.env` files | |
- **API Key Rotation**: Google Gemini supports key rotation | |
- **Local Storage**: Keys never transmitted except to authorized APIs | |
#### **LIMITED Security Components:** | |
**Session Management** | |
```python | |
# Basic UUID session tracking (FastHTML) | |
session["session_id"] = str(uuid.uuid4()) | |
# HTTP-only cookies (Next.js) | |
cookieStore.set(SESSION_KEY, newSessionId, { | |
httpOnly: true, | |
secure: process.env.NODE_ENV === "production", | |
sameSite: "lax", | |
maxAge: 60 * 60 * 24 * 30, // 30 days | |
}); | |
``` | |
**Basic Request Validation** | |
```python | |
# HTMX request validation | |
if "hx-request" not in request.headers: | |
return RedirectResponse("/search") | |
# Parameter validation | |
if not query: | |
return NextResponse.json({ error: "Query is required" }, { status: 400 }); | |
``` | |
### β οΈ **Security Limitations & Risks** | |
#### **MISSING Security Features:** | |
**β No API Authentication** | |
- Local API endpoints are **completely open** | |
- No rate limiting or abuse protection | |
- No user authentication or authorization | |
- Anyone can access `/fetch_results`, `/get_sim_map` endpoints | |
**β No Input Sanitization** | |
```python | |
# Raw user input passed directly to models | |
query = searchParams.get("query") # No validation/sanitization | |
ranking = searchParams.get("ranking") # No input filtering | |
``` | |
**β No Security Headers** | |
- No CORS configuration | |
- No Content Security Policy (CSP) | |
- No X-Frame-Options protection | |
- No X-Content-Type-Options validation | |
**β No Rate Limiting** | |
- Unlimited API requests | |
- No protection against DoS attacks | |
- No query throttling or user limits | |
**β No CSRF Protection** | |
- No token validation for state-changing operations | |
- Cross-site request forgery possible | |
### π― **Security Recommendations** | |
#### **IMMEDIATE (High Priority)** | |
**1. Add API Authentication** | |
```typescript | |
// middleware.ts - Add API key validation | |
export function middleware(request: NextRequest) { | |
const apiKey = request.headers.get("X-API-Key"); | |
if (!apiKey || apiKey !== process.env.COLPALI_API_KEY) { | |
return new Response("Unauthorized", { status: 401 }); | |
} | |
} | |
``` | |
**2. Implement Rate Limiting** | |
```typescript | |
// Use next-rate-limit or similar | |
import rateLimit from "@/lib/rate-limit"; | |
const limiter = rateLimit({ | |
interval: 60 * 1000, // 1 minute | |
uniqueTokenPerInterval: 500, // Limit each IP to 100 requests per interval | |
}); | |
await limiter.check(10, getClientIP(request)); // 10 requests per minute | |
``` | |
**3. Add Security Headers** | |
```typescript | |
// next.config.js | |
const securityHeaders = [ | |
{ key: "X-Frame-Options", value: "DENY" }, | |
{ key: "X-Content-Type-Options", value: "nosniff" }, | |
{ key: "Referrer-Policy", value: "strict-origin-when-cross-origin" }, | |
{ | |
key: "Content-Security-Policy", | |
value: "default-src 'self'; script-src 'self' 'unsafe-inline'", | |
}, | |
]; | |
``` | |
**4. Input Validation & Sanitization** | |
```typescript | |
import { z } from "zod"; | |
const SearchSchema = z.object({ | |
query: z | |
.string() | |
.min(1) | |
.max(500) | |
.regex(/^[a-zA-Z0-9\s\.\?\!]*$/), | |
ranking: z.enum(["hybrid", "colpali", "bm25"]), | |
}); | |
``` | |
#### **MEDIUM Priority** | |
**5. CORS Configuration** | |
```typescript | |
// Restrict origins to known domains | |
const corsHeaders = { | |
"Access-Control-Allow-Origin": "https://yourdomain.com", | |
"Access-Control-Allow-Methods": "GET, POST, OPTIONS", | |
"Access-Control-Allow-Headers": "Content-Type, Authorization", | |
}; | |
``` | |
**6. Request Size Limits** | |
```typescript | |
// Limit request payload sizes | |
export const config = { | |
api: { | |
bodyParser: { | |
sizeLimit: "1mb", | |
}, | |
}, | |
}; | |
``` | |
**7. Audit Logging** | |
```python | |
# Log all API access with IP, timestamp, and queries | |
logger.info(f"API_ACCESS: {client_ip} - {endpoint} - {query[:100]}") | |
``` | |
#### **LONG-TERM (Production Ready)** | |
**8. User Authentication (Optional)** | |
```typescript | |
// Add NextAuth.js or similar for user accounts | |
// Implement role-based access control | |
// Add document ownership and permissions | |
``` | |
**9. Network Security** | |
```bash | |
# Deploy behind reverse proxy (nginx/cloudflare) | |
# Enable DDoS protection | |
# Use Web Application Firewall (WAF) | |
``` | |
**10. Data Privacy Controls** | |
```typescript | |
// Implement data retention policies | |
// Add user data deletion capabilities | |
// GDPR compliance features | |
``` | |
### π **Security Best Practices** | |
#### **For LOCAL Development:** | |
- **Never commit API keys** to version control | |
- **Use strong environment variable names** (avoid `API_KEY`) | |
- **Rotate API keys regularly** (monthly) | |
- **Enable firewall** on development machines | |
- **Use HTTPS even locally** for production testing | |
#### **For PRODUCTION Deployment:** | |
- **Deploy behind CDN/WAF** (Cloudflare, AWS Shield) | |
- **Enable rate limiting** at infrastructure level | |
- **Use container security scanning** | |
- **Implement monitoring and alerting** | |
- **Regular security audits and penetration testing** | |
#### **For REMOTE Services:** | |
- **Vespa Cloud**: Follows enterprise security standards | |
- **Gemini API**: Google-managed security and compliance | |
- **Environment Isolation**: Separate dev/staging/prod credentials | |
### π¨ **Current Risk Level: MEDIUM** | |
**Suitable for:** | |
- β **Personal projects and demos** | |
- β **Internal company tools** (behind firewall) | |
- β **Research and development** environments | |
**NOT suitable for:** | |
- β **Public internet deployment** | |
- β **Customer-facing applications** | |
- β **Production environments** with sensitive data | |
- β **Commercial applications** without security hardening | |
## π― Usage Guide | |
### Basic Search | |
1. Navigate to the homepage | |
2. Enter your search query in natural language | |
3. Select ranking method (hybrid, semantic, etc.) | |
4. View results with similarity maps | |
### Similarity Maps | |
- Click on token buttons to see which parts of documents match specific query terms | |
- Visual heatmaps show attention patterns | |
- Reset button returns to original document view | |
### AI Chat | |
- Ask questions about retrieved documents | |
- Chat responses are based on document content | |
- Streaming responses for real-time interaction | |
### Search Rankings | |
- **Hybrid**: Combines multiple ranking signals | |
- **Semantic**: Pure semantic similarity | |
- **BM25**: Traditional text-based ranking | |
- **ColPali**: Visual-first ranking | |
## π οΈ Development | |
### Project Structure | |
``` | |
βββ main.py # Application entry point | |
βββ backend/ | |
β βββ colpali.py # ColPali model integration | |
β βββ vespa_app.py # Vespa client and queries | |
β βββ modelmanager.py # Model management utilities | |
βββ frontend/ | |
β βββ app.py # UI components | |
β βββ layout.py # Layout templates | |
βββ feed_vespa.py # Document upload script | |
βββ deploy_vespa_app.py # Vespa deployment script | |
βββ colpali-with-snippets/ # Vespa schema definitions | |
βββ static/ # Static assets and generated files | |
``` | |
### Running in Development | |
```bash | |
# Enable hot reload | |
export HOT_RELOAD=true | |
python main.py | |
# Or set in .env | |
echo "HOT_RELOAD=true" >> .env | |
``` | |
### Code Quality | |
```bash | |
# Format code | |
ruff format . | |
# Lint code | |
ruff check . | |
``` | |
## π API Endpoints | |
### **Current API Routes (β οΈ UNSECURED)** | |
| Endpoint | Method | Description | Security Status | | |
| ---------------- | ------ | ----------------------- | ---------------- | | |
| `/` | GET | Homepage | β Public (safe) | | |
| `/search` | GET | Search interface | β Public (safe) | | |
| `/fetch_results` | GET | Fetch search results | β οΈ **OPEN API** | | |
| `/get_sim_map` | GET | Get similarity maps | β οΈ **OPEN API** | | |
| `/get-message` | GET | Chat with AI (SSE) | β οΈ **OPEN API** | | |
| `/full_image` | GET | Get full document image | β οΈ **OPEN API** | | |
| `/suggestions` | GET | Query autocomplete | β οΈ **OPEN API** | | |
| `/static/*` | GET | Static file serving | β Public (safe) | | |
### **Security Analysis by Endpoint** | |
#### **π SECURE Endpoints** | |
- **`/`** and **`/search`**: Static HTML pages, no sensitive data | |
- **`/static/*`**: Public assets (CSS, JS, images) | |
#### **β οΈ UNSECURED Endpoints (Risk)** | |
**`/fetch_results`** - **HIGH RISK** | |
```bash | |
# Anyone can perform unlimited searches | |
curl "http://localhost:7860/fetch_results?query=secret&ranking=hybrid" | |
``` | |
- **Risks**: Resource abuse, server overload, competitive intelligence gathering | |
- **Exposes**: Search capabilities, document metadata, processing times | |
**`/get_sim_map`** - **MEDIUM RISK** | |
```bash | |
# Access similarity maps without authentication | |
curl "http://localhost:7860/get_sim_map?query_id=123&idx=0&token=word&token_idx=5" | |
``` | |
- **Risks**: Unauthorized access to visual analysis | |
- **Exposes**: Document visual patterns, query insights | |
**`/get-message`** - **HIGH RISK** | |
```bash | |
# Trigger AI processing without limits | |
curl "http://localhost:7860/get-message?query_id=123&query=question&doc_ids=doc1,doc2" | |
``` | |
- **Risks**: Gemini API abuse, cost exploitation, resource exhaustion | |
- **Exposes**: AI-generated insights, document content analysis | |
**`/full_image`** - **HIGH RISK** | |
```bash | |
# Download any document image | |
curl "http://localhost:7860/full_image?doc_id=any_document_id" | |
``` | |
- **Risks**: Unauthorized document access, data leakage | |
- **Exposes**: Full document images, potentially sensitive content | |
### **Immediate Security Fixes** | |
#### **1. Add API Key Authentication** | |
```python | |
# Python FastHTML middleware | |
@app.middleware("http") | |
async def verify_api_key(request, call_next): | |
if request.url.path.startswith("/fetch_results"): | |
api_key = request.headers.get("X-API-Key") | |
if not api_key or api_key != os.getenv("COLPALI_API_KEY"): | |
return JSONResponse({"error": "Unauthorized"}, status_code=401) | |
return await call_next(request) | |
``` | |
#### **2. Implement Rate Limiting** | |
```python | |
from slowapi import Limiter, _rate_limit_exceeded_handler | |
from slowapi.util import get_remote_address | |
limiter = Limiter(key_func=get_remote_address) | |
@rt("/fetch_results") | |
@limiter.limit("10/minute") # 10 requests per minute per IP | |
async def get_results(request, query: str, ranking: str): | |
# ... existing code | |
``` | |
#### **3. Input Validation** | |
```python | |
from pydantic import BaseModel, validator | |
class SearchRequest(BaseModel): | |
query: str | |
ranking: str | |
@validator('query') | |
def query_must_be_safe(cls, v): | |
if len(v) > 500: | |
raise ValueError('Query too long') | |
# Add sanitization logic | |
return v.strip() | |
``` | |
#### **4. Request Origin Validation** | |
```python | |
ALLOWED_ORIGINS = ["http://localhost:3000", "https://yourdomain.com"] | |
@app.middleware("http") | |
async def cors_middleware(request, call_next): | |
origin = request.headers.get("origin") | |
if origin not in ALLOWED_ORIGINS: | |
return JSONResponse({"error": "Forbidden"}, status_code=403) | |
return await call_next(request) | |
``` | |
### **π Recommended API Security Architecture** | |
``` | |
βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ | |
β Frontend β β Rate Limiter β β Backend API β | |
β β β β β β | |
β β’ API Key βββββΊβ β’ IP Limiting βββββΊβ β’ Input Valid. β | |
β β’ CORS Headers β β β’ User Quotas β β β’ Auth Checks β | |
β β’ Request Valid.β β β’ DoS Protectionβ β β’ Audit Logs β | |
βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ | |
``` | |
**Benefits:** | |
- **Layer 1**: Frontend validates requests before sending | |
- **Layer 2**: Rate limiter prevents abuse and DoS attacks | |
- **Layer 3**: Backend performs final validation and authorization | |
### **π Security Implementation Checklist** | |
#### **Before Production Deployment:** | |
**CRITICAL (Must Do):** | |
- [ ] **Generate API Key**: Create strong API key for endpoint authentication | |
- [ ] **Enable Rate Limiting**: Implement per-IP request limits | |
- [ ] **Add Security Headers**: X-Frame-Options, CSP, X-Content-Type-Options | |
- [ ] **Input Validation**: Sanitize all user inputs (query, ranking) | |
- [ ] **CORS Configuration**: Restrict origins to known domains only | |
- [ ] **Environment Security**: Never commit API keys, use secure .env | |
- [ ] **HTTPS Only**: Force TLS in production (no HTTP) | |
**HIGH Priority:** | |
- [ ] **Audit Logging**: Log all API requests with IP and timestamp | |
- [ ] **Request Size Limits**: Prevent large payload attacks | |
- [ ] **Error Handling**: Don't expose stack traces or internal details | |
- [ ] **Session Security**: HTTP-only, secure, SameSite cookies | |
- [ ] **API Documentation**: Document authentication requirements | |
**MEDIUM Priority:** | |
- [ ] **User Authentication**: Consider adding user accounts for access control | |
- [ ] **Request Timeout**: Prevent long-running request abuse | |
- [ ] **Content Validation**: Verify response content types | |
- [ ] **Monitoring**: Set up alerts for unusual API usage patterns | |
- [ ] **Backup Strategy**: Secure backup of environment variables | |
#### **Security Testing Commands:** | |
**Test API Authentication:** | |
```bash | |
# Should fail without API key | |
curl "http://localhost:7860/fetch_results?query=test&ranking=hybrid" | |
# Should succeed with API key | |
curl -H "X-API-Key: your_api_key" "http://localhost:7860/fetch_results?query=test&ranking=hybrid" | |
``` | |
**Test Rate Limiting:** | |
```bash | |
# Run multiple requests to trigger rate limit | |
for i in {1..15}; do | |
curl -H "X-API-Key: your_api_key" "http://localhost:7860/fetch_results?query=test$i&ranking=hybrid" | |
echo "Request $i" | |
done | |
``` | |
**Test Input Validation:** | |
```bash | |
# Should reject invalid/malicious inputs | |
curl -H "X-API-Key: your_api_key" "http://localhost:7860/fetch_results?query=<script>alert('xss')</script>&ranking=invalid" | |
``` | |
**Test Security Headers:** | |
```bash | |
# Check security headers in response | |
curl -I "http://localhost:7860/" | |
# Should see: X-Frame-Options, X-Content-Type-Options, etc. | |
``` | |
#### **Security Monitoring:** | |
**Log Analysis Queries:** | |
```bash | |
# Monitor API usage patterns | |
grep "API_ACCESS" /var/log/colpali.log | tail -100 | |
# Detect potential abuse | |
grep "RATE_LIMIT_EXCEEDED" /var/log/colpali.log | |
# Check authentication failures | |
grep "UNAUTHORIZED" /var/log/colpali.log | |
``` | |
**Alerting Setup:** | |
- **Rate Limit Violations**: Alert when >50 requests/minute from single IP | |
- **Authentication Failures**: Alert on repeated unauthorized attempts | |
- **Unusual Queries**: Alert on suspicious query patterns or injection attempts | |
- **Resource Usage**: Alert on high CPU/memory usage (potential DoS) | |
## π§ͺ Models Used | |
- **ColPali v1.2**: Visual document understanding | |
- **ColPaliGemma 3B**: Base visual-language model | |
- **Google Gemini 2.0**: AI chat and question answering | |
## π§ Configuration Options | |
### Environment Variables | |
| Variable | Required | Description | Security Impact | | |
| -------------------------- | -------- | ------------------------------------------- | ----------------------------------- | | |
| `VESPA_APP_TOKEN_URL` | Yes\* | Vespa application URL (token auth) | **HIGH** - Remote access | | |
| `VESPA_CLOUD_SECRET_TOKEN` | Yes\* | Vespa secret token | **CRITICAL** - Full database access | | |
| `USE_MTLS` | No | Use mTLS instead of token auth | **MEDIUM** - Auth method | | |
| `VESPA_APP_MTLS_URL` | Yes\*\* | Vespa application URL (mTLS) | **HIGH** - Remote access | | |
| `VESPA_CLOUD_MTLS_KEY` | Yes\*\* | mTLS private key | **CRITICAL** - TLS credentials | | |
| `VESPA_CLOUD_MTLS_CERT` | Yes\*\* | mTLS certificate | **HIGH** - TLS credentials | | |
| `GEMINI_API_KEY` | No | Google Gemini API key | **HIGH** - AI access/costs | | |
| `LOG_LEVEL` | No | Logging level (DEBUG, INFO, WARNING, ERROR) | **LOW** - Debug info | | |
| `HOT_RELOAD` | No | Enable hot reload in development | **LOW** - Dev convenience | | |
#### **π Security-Related Environment Variables (Recommended)** | |
| Variable | Required | Description | Default | | |
| -------------------------- | --------- | ------------------------------------ | ------- | | |
| `COLPALI_API_KEY` | **YES\*** | API key for endpoint authentication | None | | |
| `ALLOWED_ORIGINS` | **YES\*** | Comma-separated allowed CORS origins | None | | |
| `RATE_LIMIT_REQUESTS` | No | Max requests per minute per IP | `10` | | |
| `RATE_LIMIT_WINDOW` | No | Rate limit window in seconds | `60` | | |
| `MAX_QUERY_LENGTH` | No | Maximum query string length | `500` | | |
| `ENABLE_AUDIT_LOGGING` | No | Log all API requests for security | `false` | | |
| `SECURITY_HEADERS_ENABLED` | No | Enable security headers | `true` | | |
| `CSRF_SECRET` | **YES\*** | Secret for CSRF token generation | None | | |
**Example Security-Enhanced `.env`:** | |
```bash | |
# Existing configuration | |
VESPA_APP_TOKEN_URL=https://your-app.vespa-cloud.com | |
VESPA_CLOUD_SECRET_TOKEN=your_vespa_secret_token | |
GEMINI_API_KEY=your_gemini_api_key | |
# NEW: Security configuration | |
COLPALI_API_KEY=your_strong_random_api_key_here | |
ALLOWED_ORIGINS=http://localhost:3000,https://yourdomain.com | |
RATE_LIMIT_REQUESTS=10 | |
RATE_LIMIT_WINDOW=60 | |
MAX_QUERY_LENGTH=500 | |
ENABLE_AUDIT_LOGGING=true | |
SECURITY_HEADERS_ENABLED=true | |
CSRF_SECRET=your_random_csrf_secret_here | |
# Development vs Production | |
NODE_ENV=production # Enable secure cookies | |
LOG_LEVEL=INFO # Don't expose debug info in production | |
``` | |
\*Required for token authentication | |
\*\*Required for mTLS authentication | |
\*\*\*Required for production security | |
## π¨ Troubleshooting | |
### **LOCAL Processing Issues** | |
**ColPali model fails to load:** | |
```bash | |
# Check GPU memory | |
nvidia-smi # For NVIDIA GPUs | |
# or | |
system_profiler SPDisplaysDataType # For Apple Silicon | |
# Clear model cache if corrupted | |
rm -rf ~/.cache/huggingface/hub/models--vidore--colpali-v1.2 | |
``` | |
**Out of memory errors:** | |
- Reduce batch size in `feed_vespa.py` (try `batch_size=1`) | |
- Close other applications to free RAM/VRAM | |
- Use CPU processing if GPU memory insufficient: `CUDA_VISIBLE_DEVICES="" python main.py` | |
**Slow processing on CPU:** | |
- Expected behavior - ColPali requires significant computation | |
- Consider upgrading to GPU or Apple Silicon for 5-10x speedup | |
- Process documents overnight for large collections | |
### **REMOTE Processing Issues** | |
**Connection to Vespa fails:** | |
- Verify your Vespa URL and credentials in `.env` | |
- Check if the Vespa application is deployed and running | |
- Ensure network connectivity: `ping your-app.vespa-cloud.com` | |
- Validate authentication tokens haven't expired | |
**Document upload fails:** | |
- Check Vespa Cloud storage quota and billing | |
- Verify embedding format matches Vespa schema | |
- Ensure stable internet connection for large uploads | |
**Search returns no results:** | |
- Confirm documents were successfully uploaded to Vespa | |
- Check if embeddings were properly indexed | |
- Verify query processing isn't failing locally | |
### **MIXED (Local + Remote) Issues** | |
**Chat features don't work:** | |
- **LOCAL**: Verify document images are being generated locally | |
- **REMOTE**: Check `GEMINI_API_KEY` is set correctly | |
- **REMOTE**: Verify Gemini API quota and billing | |
- **NETWORK**: Ensure images can be sent to Gemini API | |
**Similarity maps missing:** | |
- **LOCAL**: Confirm ColPali model loaded successfully | |
- **LOCAL**: Check if similarity map generation completed | |
- **REMOTE**: Verify Vespa returned similarity data | |
- **BROWSER**: Clear browser cache for static files | |
### Performance Tips | |
**LOCAL Optimization:** | |
- Use GPU acceleration for 5-10x faster model inference | |
- Optimize batch sizes based on available memory | |
- Use SSD storage for faster model loading | |
- Consider quantized models for lower memory usage | |
**REMOTE Optimization:** | |
- Use Vespa's HNSW indexing for faster search | |
- Optimize embedding dimensions vs accuracy tradeoff | |
- Enable compression for faster network transfer | |
- Use multiple Vespa instances for high availability | |
**NETWORK Optimization:** | |
- Process documents in batches to reduce upload overhead | |
- Use compression for embedding transfer | |
- Consider regional Vespa deployment for lower latency | |
## π License | |
Apache-2.0 | |
## π€ Contributing | |
1. Fork the repository | |
2. Create a feature branch | |
3. Make your changes | |
4. Run tests and linting | |
5. Submit a pull request | |
## π Support | |
For issues and questions: | |
- Check the troubleshooting section | |
- Review Vespa and ColPali documentation | |
- Open an issue on the repository | |