vk98's picture
Initial deployment of ColPali Visual Retrieval backend
a54266b

ColPali 🀝 Vespa - Visual Retrieval System

A powerful visual document retrieval system that combines ColPali (Contextual Late Interaction with Patch-level Information) with Vespa for scalable, intelligent document search and question-answering.

🌟 Features

πŸ” Visual Document Search

  • Multi-modal retrieval: Search through PDF documents using natural language queries
  • Visual understanding: ColPali model processes document images and text simultaneously
  • Token-level similarity maps: Visualize exactly which parts of documents match your query
  • Multiple ranking algorithms: Choose between hybrid, semantic, and other ranking methods

🧠 AI-Powered Chat

  • Intelligent Q&A: Ask questions about retrieved documents using Google Gemini 2.0
  • Context-aware responses: AI analyzes document images to provide accurate answers
  • Real-time streaming: Get responses as they're generated

⚑ Scalable Infrastructure

  • Vespa integration: Enterprise-grade search platform for large document collections
  • Real-time processing: Instant search results and similarity map generation
  • Cloud-ready: Supports Vespa Cloud deployment with secure authentication

πŸ—οΈ Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚   Frontend      β”‚    β”‚    Backend      β”‚    β”‚   Vespa Cloud   β”‚
β”‚   (Browser)     β”‚    β”‚   (Your Local   β”‚    β”‚   (Remote)      β”‚
β”‚                 β”‚    β”‚    Computer)    β”‚    β”‚                 β”‚
β”‚ β€’ Search UI     │◄──►│ β€’ ColPali Model │◄──►│ β€’ Document Storeβ”‚
β”‚ β€’ Similarity    β”‚    β”‚ β€’ Query Proc.   β”‚    β”‚ β€’ Vector Search β”‚
β”‚   Maps          β”‚    β”‚ β€’ Sim Map Gen.  β”‚    β”‚ β€’ Ranking       β”‚
β”‚ β€’ Chat Interfaceβ”‚    β”‚ β€’ Gemini Int.   β”‚    β”‚                 β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
        ↑                        ↑                        ↑
   Web Browser              LOCAL AI               REMOTE Storage

🏠 LOCAL Processing (Your Computer)

All AI model inference happens on YOUR local machine:

  • ColPali Model: Runs locally on your GPU/CPU (~7GB model)
  • Document Processing: PDF β†’ Images β†’ Embeddings (local)
  • Query Processing: Text β†’ Embeddings (local)
  • Similarity Maps: Visual attention generation (local)
  • Gemini Chat: Processes retrieved images locally

Device Detection:

device = get_torch_device("auto")  # Detects: CUDA, MPS (Apple), or CPU
print(f"Using device: {device}")   # Shows YOUR hardware

☁️ REMOTE Processing (Vespa Cloud)

Only storage and search index operations happen remotely:

  • Document Storage: Stores processed embeddings (not raw models)
  • Vector Search: Fast similarity search across document collection
  • Query Routing: Handles search requests and ranking
  • Metadata Storage: Document titles, URLs, page numbers

πŸ”„ Complete Data Flow

Document Upload Process:

  1. LOCAL: Your computer downloads PDF from URL
  2. LOCAL: ColPali converts PDF pages to images
  3. LOCAL: ColPali generates visual embeddings (1024 patches Γ— 128 dims)
  4. LOCAL: Embeddings converted to binary format for efficiency
  5. REMOTE: Binary embeddings uploaded to Vespa Cloud
  6. REMOTE: Vespa indexes embeddings for fast search

Search Query Process:

  1. LOCAL: You enter search query in browser
  2. LOCAL: ColPali processes query β†’ generates query embeddings
  3. REMOTE: Query embeddings sent to Vespa Cloud
  4. REMOTE: Vespa searches document index, returns matches
  5. LOCAL: ColPali generates similarity maps for results
  6. BROWSER: Results displayed with visual attention maps

AI Chat Process:

  1. LOCAL: Retrieved document images processed by your machine
  2. REMOTE: Images + query sent to Google Gemini API
  3. REMOTE: Gemini generates response based on visual content
  4. BROWSER: Streaming response displayed in real-time

Core Components

  • ColPali Model: Visual-language model for document understanding (LOCAL)
  • Vespa Search: Distributed search and storage engine (REMOTE)
  • FastHTML Frontend: Modern, responsive web interface (BROWSER)
  • Gemini Integration: AI-powered question answering (REMOTE API)
  • Similarity Map Generator: Visual attention visualization (LOCAL)

πŸ’» System Requirements

LOCAL Machine Requirements (For AI Processing)

Minimum:

  • CPU: Modern multi-core processor (Intel/AMD/Apple Silicon)
  • RAM: 8GB+ (16GB recommended)
  • Storage: 10GB free space (for model cache)
  • Python: 3.10+ (< 3.13)

Recommended:

  • GPU: NVIDIA GPU with 8GB+ VRAM (RTX 3070/4060 or better)
  • Apple: M1/M2/M3 Mac (uses Metal Performance Shaders)
  • RAM: 16GB+ for smoother processing
  • Storage: SSD for faster model loading

Performance Examples:

  • RTX 4090: ~1-2 seconds per query
  • RTX 3070: ~3-5 seconds per query
  • Apple M2: ~4-6 seconds per query
  • CPU Only: ~15-30 seconds per query

REMOTE Requirements (Vespa Cloud)

What you need:

  • Vespa Cloud account (handles all remote processing)
  • Internet connection (for uploading embeddings and search queries)
  • Authentication tokens (provided by Vespa Cloud)

What Vespa Cloud provides:

  • Scalable storage for any number of documents
  • Sub-second search across millions of embeddings
  • High availability with automatic failover
  • Global CDN for fast access worldwide

πŸ’° Cost Breakdown

FREE Components

  • ColPali Model: Open source, runs locally (no per-query costs)
  • Python Application: MIT/Apache licensed, completely free
  • Local Processing: Uses your own hardware (no cloud AI fees)

PAID Components

  • Vespa Cloud: Pay for storage and search operations
    • ~$0.001 per 1000 searches
    • ~$0.10 per GB storage per month
  • Google Gemini API: Optional, for chat features only
    • ~$0.01 per 1000 image tokens
    • Only used when you ask questions about documents

Cost Examples (Monthly)

  • Personal Use (100 documents, 1000 searches): ~$5-10/month
  • Small Business (1000 documents, 10k searches): ~$20-50/month
  • Enterprise (10k+ documents, 100k+ searches): $200+/month

πŸ’‘ Cost Optimization Tips:

  • Use local Vespa installation to avoid cloud costs
  • Disable Gemini chat if not needed (saves API costs)
  • Process documents in batches to minimize upload time

πŸš€ Quick Start

Prerequisites

  • Python 3.10+ (< 3.13)
  • 8GB+ RAM for ColPali model
  • Vespa Cloud account or local Vespa installation
  • Google Gemini API key (optional, for chat features)
  • GPU recommended but not required

1. Installation

# Clone the repository
git clone <repository-url>
cd colpali-vespa-visual-retrieval

# Install dependencies
pip install -e .

# For development
pip install -e ".[dev]"

# For document feeding capabilities
pip install -e ".[feed]"

2. Environment Configuration

Create a .env file with your configuration:

# Vespa Configuration
VESPA_APP_TOKEN_URL=https://your-app.vespa-cloud.com
VESPA_CLOUD_SECRET_TOKEN=your_secret_token

# Alternative: mTLS Authentication
USE_MTLS=false
VESPA_APP_MTLS_URL=https://your-app.vespa-cloud.com
VESPA_CLOUD_MTLS_KEY="-----BEGIN PRIVATE KEY-----..."
VESPA_CLOUD_MTLS_CERT="-----BEGIN CERTIFICATE-----..."

# Optional: Gemini AI (for chat features)
GEMINI_API_KEY=your_gemini_api_key

# Optional: Logging
LOG_LEVEL=INFO
HOT_RELOAD=false

3. Deploy Vespa Application

# Deploy the Vespa schema and configuration
python deploy_vespa_app.py \
  --tenant_name your_tenant \
  --vespa_application_name colpalidemo \
  --token_id_write colpalidemo_write \
  --token_id_read colpalidemo_read

4. Run the Application

python main.py

The application will be available at http://localhost:7860

πŸ“š Document Management

Uploading Documents

Use the feeding script to process and upload PDF documents:

python feed_vespa.py \
  --application_name colpalidemo \
  --vespa_schema_name pdf_page

Document Processing Pipeline (LOCAL β†’ REMOTE):

  1. PDF Download (LOCAL): Your computer downloads PDFs from URLs
  2. PDF Conversion (LOCAL): PDFs converted to images (one per page)
  3. ColPali Processing (LOCAL): Each page processed by ColPali model on YOUR GPU/CPU
  4. Embedding Generation (LOCAL): Visual embeddings created (1024 patches Γ— 128 dimensions)
  5. Binary Encoding (LOCAL): Embeddings converted to efficient binary format
  6. Vespa Upload (REMOTE): Binary embeddings uploaded to Vespa Cloud
  7. Search Indexing (REMOTE): Vespa indexes embeddings for fast retrieval

⚠️ Important Notes:

  • Processing Time: Expect 5-30 seconds per page depending on your hardware
  • Network Usage: Only final embeddings uploaded (~1KB per page vs ~1MB original)
  • Privacy: Original PDFs and images stay on your local machine
  • Storage: Raw images cached locally for similarity map generation

Supported Operations

  • βœ… Upload Documents: Add new PDFs to the system
  • βœ… Search Documents: Query existing documents
  • βœ… View Documents: Browse stored documents
  • ❌ Remove Documents: Not currently implemented
  • ❌ Update Documents: Not currently implemented

πŸ” Authentication & Security

πŸ›‘οΈ Current Security Implementation

SECURE Components:

Vespa Authentication (REMOTE)

  • Token Authentication: Bearer tokens for Vespa Cloud API access
  • mTLS Certificates: Mutual TLS for enterprise security
  • Encrypted Communication: HTTPS/TLS for all Vespa connections

API Key Management (LOCAL)

  • Environment Variables: Sensitive keys stored in .env files
  • API Key Rotation: Google Gemini supports key rotation
  • Local Storage: Keys never transmitted except to authorized APIs

LIMITED Security Components:

Session Management

# Basic UUID session tracking (FastHTML)
session["session_id"] = str(uuid.uuid4())

# HTTP-only cookies (Next.js)
cookieStore.set(SESSION_KEY, newSessionId, {
  httpOnly: true,
  secure: process.env.NODE_ENV === "production",
  sameSite: "lax",
  maxAge: 60 * 60 * 24 * 30, // 30 days
});

Basic Request Validation

# HTMX request validation
if "hx-request" not in request.headers:
    return RedirectResponse("/search")

# Parameter validation
if not query:
    return NextResponse.json({ error: "Query is required" }, { status: 400 });

⚠️ Security Limitations & Risks

MISSING Security Features:

❌ No API Authentication

  • Local API endpoints are completely open
  • No rate limiting or abuse protection
  • No user authentication or authorization
  • Anyone can access /fetch_results, /get_sim_map endpoints

❌ No Input Sanitization

# Raw user input passed directly to models
query = searchParams.get("query")  # No validation/sanitization
ranking = searchParams.get("ranking")  # No input filtering

❌ No Security Headers

  • No CORS configuration
  • No Content Security Policy (CSP)
  • No X-Frame-Options protection
  • No X-Content-Type-Options validation

❌ No Rate Limiting

  • Unlimited API requests
  • No protection against DoS attacks
  • No query throttling or user limits

❌ No CSRF Protection

  • No token validation for state-changing operations
  • Cross-site request forgery possible

🎯 Security Recommendations

IMMEDIATE (High Priority)

1. Add API Authentication

// middleware.ts - Add API key validation
export function middleware(request: NextRequest) {
  const apiKey = request.headers.get("X-API-Key");
  if (!apiKey || apiKey !== process.env.COLPALI_API_KEY) {
    return new Response("Unauthorized", { status: 401 });
  }
}

2. Implement Rate Limiting

// Use next-rate-limit or similar
import rateLimit from "@/lib/rate-limit";

const limiter = rateLimit({
  interval: 60 * 1000, // 1 minute
  uniqueTokenPerInterval: 500, // Limit each IP to 100 requests per interval
});

await limiter.check(10, getClientIP(request)); // 10 requests per minute

3. Add Security Headers

// next.config.js
const securityHeaders = [
  { key: "X-Frame-Options", value: "DENY" },
  { key: "X-Content-Type-Options", value: "nosniff" },
  { key: "Referrer-Policy", value: "strict-origin-when-cross-origin" },
  {
    key: "Content-Security-Policy",
    value: "default-src 'self'; script-src 'self' 'unsafe-inline'",
  },
];

4. Input Validation & Sanitization

import { z } from "zod";

const SearchSchema = z.object({
  query: z
    .string()
    .min(1)
    .max(500)
    .regex(/^[a-zA-Z0-9\s\.\?\!]*$/),
  ranking: z.enum(["hybrid", "colpali", "bm25"]),
});

MEDIUM Priority

5. CORS Configuration

// Restrict origins to known domains
const corsHeaders = {
  "Access-Control-Allow-Origin": "https://yourdomain.com",
  "Access-Control-Allow-Methods": "GET, POST, OPTIONS",
  "Access-Control-Allow-Headers": "Content-Type, Authorization",
};

6. Request Size Limits

// Limit request payload sizes
export const config = {
  api: {
    bodyParser: {
      sizeLimit: "1mb",
    },
  },
};

7. Audit Logging

# Log all API access with IP, timestamp, and queries
logger.info(f"API_ACCESS: {client_ip} - {endpoint} - {query[:100]}")

LONG-TERM (Production Ready)

8. User Authentication (Optional)

// Add NextAuth.js or similar for user accounts
// Implement role-based access control
// Add document ownership and permissions

9. Network Security

# Deploy behind reverse proxy (nginx/cloudflare)
# Enable DDoS protection
# Use Web Application Firewall (WAF)

10. Data Privacy Controls

// Implement data retention policies
// Add user data deletion capabilities
// GDPR compliance features

πŸ”’ Security Best Practices

For LOCAL Development:

  • Never commit API keys to version control
  • Use strong environment variable names (avoid API_KEY)
  • Rotate API keys regularly (monthly)
  • Enable firewall on development machines
  • Use HTTPS even locally for production testing

For PRODUCTION Deployment:

  • Deploy behind CDN/WAF (Cloudflare, AWS Shield)
  • Enable rate limiting at infrastructure level
  • Use container security scanning
  • Implement monitoring and alerting
  • Regular security audits and penetration testing

For REMOTE Services:

  • Vespa Cloud: Follows enterprise security standards
  • Gemini API: Google-managed security and compliance
  • Environment Isolation: Separate dev/staging/prod credentials

🚨 Current Risk Level: MEDIUM

Suitable for:

  • βœ… Personal projects and demos
  • βœ… Internal company tools (behind firewall)
  • βœ… Research and development environments

NOT suitable for:

  • ❌ Public internet deployment
  • ❌ Customer-facing applications
  • ❌ Production environments with sensitive data
  • ❌ Commercial applications without security hardening

🎯 Usage Guide

Basic Search

  1. Navigate to the homepage
  2. Enter your search query in natural language
  3. Select ranking method (hybrid, semantic, etc.)
  4. View results with similarity maps

Similarity Maps

  • Click on token buttons to see which parts of documents match specific query terms
  • Visual heatmaps show attention patterns
  • Reset button returns to original document view

AI Chat

  • Ask questions about retrieved documents
  • Chat responses are based on document content
  • Streaming responses for real-time interaction

Search Rankings

  • Hybrid: Combines multiple ranking signals
  • Semantic: Pure semantic similarity
  • BM25: Traditional text-based ranking
  • ColPali: Visual-first ranking

πŸ› οΈ Development

Project Structure

β”œβ”€β”€ main.py                 # Application entry point
β”œβ”€β”€ backend/
β”‚   β”œβ”€β”€ colpali.py         # ColPali model integration
β”‚   β”œβ”€β”€ vespa_app.py       # Vespa client and queries
β”‚   └── modelmanager.py    # Model management utilities
β”œβ”€β”€ frontend/
β”‚   β”œβ”€β”€ app.py             # UI components
β”‚   └── layout.py          # Layout templates
β”œβ”€β”€ feed_vespa.py          # Document upload script
β”œβ”€β”€ deploy_vespa_app.py    # Vespa deployment script
β”œβ”€β”€ colpali-with-snippets/ # Vespa schema definitions
└── static/                # Static assets and generated files

Running in Development

# Enable hot reload
export HOT_RELOAD=true
python main.py

# Or set in .env
echo "HOT_RELOAD=true" >> .env

Code Quality

# Format code
ruff format .

# Lint code
ruff check .

πŸ“Š API Endpoints

Current API Routes (⚠️ UNSECURED)

Endpoint Method Description Security Status
/ GET Homepage βœ… Public (safe)
/search GET Search interface βœ… Public (safe)
/fetch_results GET Fetch search results ⚠️ OPEN API
/get_sim_map GET Get similarity maps ⚠️ OPEN API
/get-message GET Chat with AI (SSE) ⚠️ OPEN API
/full_image GET Get full document image ⚠️ OPEN API
/suggestions GET Query autocomplete ⚠️ OPEN API
/static/* GET Static file serving βœ… Public (safe)

Security Analysis by Endpoint

πŸ”’ SECURE Endpoints

  • / and /search: Static HTML pages, no sensitive data
  • /static/*: Public assets (CSS, JS, images)

⚠️ UNSECURED Endpoints (Risk)

/fetch_results - HIGH RISK

# Anyone can perform unlimited searches
curl "http://localhost:7860/fetch_results?query=secret&ranking=hybrid"
  • Risks: Resource abuse, server overload, competitive intelligence gathering
  • Exposes: Search capabilities, document metadata, processing times

/get_sim_map - MEDIUM RISK

# Access similarity maps without authentication
curl "http://localhost:7860/get_sim_map?query_id=123&idx=0&token=word&token_idx=5"
  • Risks: Unauthorized access to visual analysis
  • Exposes: Document visual patterns, query insights

/get-message - HIGH RISK

# Trigger AI processing without limits
curl "http://localhost:7860/get-message?query_id=123&query=question&doc_ids=doc1,doc2"
  • Risks: Gemini API abuse, cost exploitation, resource exhaustion
  • Exposes: AI-generated insights, document content analysis

/full_image - HIGH RISK

# Download any document image
curl "http://localhost:7860/full_image?doc_id=any_document_id"
  • Risks: Unauthorized document access, data leakage
  • Exposes: Full document images, potentially sensitive content

Immediate Security Fixes

1. Add API Key Authentication

# Python FastHTML middleware
@app.middleware("http")
async def verify_api_key(request, call_next):
    if request.url.path.startswith("/fetch_results"):
        api_key = request.headers.get("X-API-Key")
        if not api_key or api_key != os.getenv("COLPALI_API_KEY"):
            return JSONResponse({"error": "Unauthorized"}, status_code=401)
    return await call_next(request)

2. Implement Rate Limiting

from slowapi import Limiter, _rate_limit_exceeded_handler
from slowapi.util import get_remote_address

limiter = Limiter(key_func=get_remote_address)

@rt("/fetch_results")
@limiter.limit("10/minute")  # 10 requests per minute per IP
async def get_results(request, query: str, ranking: str):
    # ... existing code

3. Input Validation

from pydantic import BaseModel, validator

class SearchRequest(BaseModel):
    query: str
    ranking: str

    @validator('query')
    def query_must_be_safe(cls, v):
        if len(v) > 500:
            raise ValueError('Query too long')
        # Add sanitization logic
        return v.strip()

4. Request Origin Validation

ALLOWED_ORIGINS = ["http://localhost:3000", "https://yourdomain.com"]

@app.middleware("http")
async def cors_middleware(request, call_next):
    origin = request.headers.get("origin")
    if origin not in ALLOWED_ORIGINS:
        return JSONResponse({"error": "Forbidden"}, status_code=403)
    return await call_next(request)

πŸ“ˆ Recommended API Security Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚   Frontend      β”‚    β”‚  Rate Limiter   β”‚    β”‚   Backend API   β”‚
β”‚                 β”‚    β”‚                 β”‚    β”‚                 β”‚
β”‚ β€’ API Key       │◄──►│ β€’ IP Limiting   │◄──►│ β€’ Input Valid.  β”‚
β”‚ β€’ CORS Headers  β”‚    β”‚ β€’ User Quotas   β”‚    β”‚ β€’ Auth Checks   β”‚
β”‚ β€’ Request Valid.β”‚    β”‚ β€’ DoS Protectionβ”‚    β”‚ β€’ Audit Logs    β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Benefits:

  • Layer 1: Frontend validates requests before sending
  • Layer 2: Rate limiter prevents abuse and DoS attacks
  • Layer 3: Backend performs final validation and authorization

πŸ”’ Security Implementation Checklist

Before Production Deployment:

CRITICAL (Must Do):

  • Generate API Key: Create strong API key for endpoint authentication
  • Enable Rate Limiting: Implement per-IP request limits
  • Add Security Headers: X-Frame-Options, CSP, X-Content-Type-Options
  • Input Validation: Sanitize all user inputs (query, ranking)
  • CORS Configuration: Restrict origins to known domains only
  • Environment Security: Never commit API keys, use secure .env
  • HTTPS Only: Force TLS in production (no HTTP)

HIGH Priority:

  • Audit Logging: Log all API requests with IP and timestamp
  • Request Size Limits: Prevent large payload attacks
  • Error Handling: Don't expose stack traces or internal details
  • Session Security: HTTP-only, secure, SameSite cookies
  • API Documentation: Document authentication requirements

MEDIUM Priority:

  • User Authentication: Consider adding user accounts for access control
  • Request Timeout: Prevent long-running request abuse
  • Content Validation: Verify response content types
  • Monitoring: Set up alerts for unusual API usage patterns
  • Backup Strategy: Secure backup of environment variables

Security Testing Commands:

Test API Authentication:

# Should fail without API key
curl "http://localhost:7860/fetch_results?query=test&ranking=hybrid"

# Should succeed with API key
curl -H "X-API-Key: your_api_key" "http://localhost:7860/fetch_results?query=test&ranking=hybrid"

Test Rate Limiting:

# Run multiple requests to trigger rate limit
for i in {1..15}; do
  curl -H "X-API-Key: your_api_key" "http://localhost:7860/fetch_results?query=test$i&ranking=hybrid"
  echo "Request $i"
done

Test Input Validation:

# Should reject invalid/malicious inputs
curl -H "X-API-Key: your_api_key" "http://localhost:7860/fetch_results?query=<script>alert('xss')</script>&ranking=invalid"

Test Security Headers:

# Check security headers in response
curl -I "http://localhost:7860/"
# Should see: X-Frame-Options, X-Content-Type-Options, etc.

Security Monitoring:

Log Analysis Queries:

# Monitor API usage patterns
grep "API_ACCESS" /var/log/colpali.log | tail -100

# Detect potential abuse
grep "RATE_LIMIT_EXCEEDED" /var/log/colpali.log

# Check authentication failures
grep "UNAUTHORIZED" /var/log/colpali.log

Alerting Setup:

  • Rate Limit Violations: Alert when >50 requests/minute from single IP
  • Authentication Failures: Alert on repeated unauthorized attempts
  • Unusual Queries: Alert on suspicious query patterns or injection attempts
  • Resource Usage: Alert on high CPU/memory usage (potential DoS)

πŸ§ͺ Models Used

  • ColPali v1.2: Visual document understanding
  • ColPaliGemma 3B: Base visual-language model
  • Google Gemini 2.0: AI chat and question answering

πŸ”§ Configuration Options

Environment Variables

Variable Required Description Security Impact
VESPA_APP_TOKEN_URL Yes* Vespa application URL (token auth) HIGH - Remote access
VESPA_CLOUD_SECRET_TOKEN Yes* Vespa secret token CRITICAL - Full database access
USE_MTLS No Use mTLS instead of token auth MEDIUM - Auth method
VESPA_APP_MTLS_URL Yes** Vespa application URL (mTLS) HIGH - Remote access
VESPA_CLOUD_MTLS_KEY Yes** mTLS private key CRITICAL - TLS credentials
VESPA_CLOUD_MTLS_CERT Yes** mTLS certificate HIGH - TLS credentials
GEMINI_API_KEY No Google Gemini API key HIGH - AI access/costs
LOG_LEVEL No Logging level (DEBUG, INFO, WARNING, ERROR) LOW - Debug info
HOT_RELOAD No Enable hot reload in development LOW - Dev convenience

πŸ”’ Security-Related Environment Variables (Recommended)

Variable Required Description Default
COLPALI_API_KEY YES* API key for endpoint authentication None
ALLOWED_ORIGINS YES* Comma-separated allowed CORS origins None
RATE_LIMIT_REQUESTS No Max requests per minute per IP 10
RATE_LIMIT_WINDOW No Rate limit window in seconds 60
MAX_QUERY_LENGTH No Maximum query string length 500
ENABLE_AUDIT_LOGGING No Log all API requests for security false
SECURITY_HEADERS_ENABLED No Enable security headers true
CSRF_SECRET YES* Secret for CSRF token generation None

Example Security-Enhanced .env:

# Existing configuration
VESPA_APP_TOKEN_URL=https://your-app.vespa-cloud.com
VESPA_CLOUD_SECRET_TOKEN=your_vespa_secret_token
GEMINI_API_KEY=your_gemini_api_key

# NEW: Security configuration
COLPALI_API_KEY=your_strong_random_api_key_here
ALLOWED_ORIGINS=http://localhost:3000,https://yourdomain.com
RATE_LIMIT_REQUESTS=10
RATE_LIMIT_WINDOW=60
MAX_QUERY_LENGTH=500
ENABLE_AUDIT_LOGGING=true
SECURITY_HEADERS_ENABLED=true
CSRF_SECRET=your_random_csrf_secret_here

# Development vs Production
NODE_ENV=production  # Enable secure cookies
LOG_LEVEL=INFO       # Don't expose debug info in production

*Required for token authentication
**Required for mTLS authentication
***Required for production security

🚨 Troubleshooting

LOCAL Processing Issues

ColPali model fails to load:

# Check GPU memory
nvidia-smi  # For NVIDIA GPUs
# or
system_profiler SPDisplaysDataType  # For Apple Silicon

# Clear model cache if corrupted
rm -rf ~/.cache/huggingface/hub/models--vidore--colpali-v1.2

Out of memory errors:

  • Reduce batch size in feed_vespa.py (try batch_size=1)
  • Close other applications to free RAM/VRAM
  • Use CPU processing if GPU memory insufficient: CUDA_VISIBLE_DEVICES="" python main.py

Slow processing on CPU:

  • Expected behavior - ColPali requires significant computation
  • Consider upgrading to GPU or Apple Silicon for 5-10x speedup
  • Process documents overnight for large collections

REMOTE Processing Issues

Connection to Vespa fails:

  • Verify your Vespa URL and credentials in .env
  • Check if the Vespa application is deployed and running
  • Ensure network connectivity: ping your-app.vespa-cloud.com
  • Validate authentication tokens haven't expired

Document upload fails:

  • Check Vespa Cloud storage quota and billing
  • Verify embedding format matches Vespa schema
  • Ensure stable internet connection for large uploads

Search returns no results:

  • Confirm documents were successfully uploaded to Vespa
  • Check if embeddings were properly indexed
  • Verify query processing isn't failing locally

MIXED (Local + Remote) Issues

Chat features don't work:

  • LOCAL: Verify document images are being generated locally
  • REMOTE: Check GEMINI_API_KEY is set correctly
  • REMOTE: Verify Gemini API quota and billing
  • NETWORK: Ensure images can be sent to Gemini API

Similarity maps missing:

  • LOCAL: Confirm ColPali model loaded successfully
  • LOCAL: Check if similarity map generation completed
  • REMOTE: Verify Vespa returned similarity data
  • BROWSER: Clear browser cache for static files

Performance Tips

LOCAL Optimization:

  • Use GPU acceleration for 5-10x faster model inference
  • Optimize batch sizes based on available memory
  • Use SSD storage for faster model loading
  • Consider quantized models for lower memory usage

REMOTE Optimization:

  • Use Vespa's HNSW indexing for faster search
  • Optimize embedding dimensions vs accuracy tradeoff
  • Enable compression for faster network transfer
  • Use multiple Vespa instances for high availability

NETWORK Optimization:

  • Process documents in batches to reduce upload overhead
  • Use compression for embedding transfer
  • Consider regional Vespa deployment for lower latency

πŸ“„ License

Apache-2.0

🀝 Contributing

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes
  4. Run tests and linting
  5. Submit a pull request

πŸ“ž Support

For issues and questions:

  • Check the troubleshooting section
  • Review Vespa and ColPali documentation
  • Open an issue on the repository