ColPali 🤝 Vespa - Visual Retrieval System

A powerful visual document retrieval system that combines ColPali (Contextual Late Interaction with Patch-level Information) with Vespa for scalable, intelligent document search and question-answering.

🌟 Features

🔍 Visual Document Search

Multi-modal retrieval: Search through PDF documents using natural language queries
Visual understanding: ColPali model processes document images and text simultaneously
Token-level similarity maps: Visualize exactly which parts of documents match your query
Multiple ranking algorithms: Choose between hybrid, semantic, and other ranking methods

🧠 AI-Powered Chat

Intelligent Q&A: Ask questions about retrieved documents using Google Gemini 2.0
Context-aware responses: AI analyzes document images to provide accurate answers
Real-time streaming: Get responses as they're generated

⚡ Scalable Infrastructure

Vespa integration: Enterprise-grade search platform for large document collections
Real-time processing: Instant search results and similarity map generation
Cloud-ready: Supports Vespa Cloud deployment with secure authentication

🏗️ Architecture

┌─────────────────┐    ┌─────────────────┐    ┌─────────────────┐
│   Frontend      │    │    Backend      │    │   Vespa Cloud   │
│   (Browser)     │    │   (Your Local   │    │   (Remote)      │
│                 │    │    Computer)    │    │                 │
│ • Search UI     │◄──►│ • ColPali Model │◄──►│ • Document Store│
│ • Similarity    │    │ • Query Proc.   │    │ • Vector Search │
│   Maps          │    │ • Sim Map Gen.  │    │ • Ranking       │
│ • Chat Interface│    │ • Gemini Int.   │    │                 │
└─────────────────┘    └─────────────────┘    └─────────────────┘
        ↑                        ↑                        ↑
   Web Browser              LOCAL AI               REMOTE Storage

🏠 LOCAL Processing (Your Computer)

All AI model inference happens on YOUR local machine:

ColPali Model: Runs locally on your GPU/CPU (~7GB model)
Document Processing: PDF → Images → Embeddings (local)
Query Processing: Text → Embeddings (local)
Similarity Maps: Visual attention generation (local)
Gemini Chat: Processes retrieved images locally

Device Detection:

device = get_torch_device("auto")  # Detects: CUDA, MPS (Apple), or CPU
print(f"Using device: {device}")   # Shows YOUR hardware

☁️ REMOTE Processing (Vespa Cloud)

Only storage and search index operations happen remotely:

Document Storage: Stores processed embeddings (not raw models)
Vector Search: Fast similarity search across document collection
Query Routing: Handles search requests and ranking
Metadata Storage: Document titles, URLs, page numbers

🔄 Complete Data Flow

Document Upload Process:

LOCAL: Your computer downloads PDF from URL
LOCAL: ColPali converts PDF pages to images
LOCAL: ColPali generates visual embeddings (1024 patches × 128 dims)
LOCAL: Embeddings converted to binary format for efficiency
REMOTE: Binary embeddings uploaded to Vespa Cloud
REMOTE: Vespa indexes embeddings for fast search

Search Query Process:

LOCAL: You enter search query in browser
LOCAL: ColPali processes query → generates query embeddings
REMOTE: Query embeddings sent to Vespa Cloud
REMOTE: Vespa searches document index, returns matches
LOCAL: ColPali generates similarity maps for results
BROWSER: Results displayed with visual attention maps

AI Chat Process:

LOCAL: Retrieved document images processed by your machine
REMOTE: Images + query sent to Google Gemini API
REMOTE: Gemini generates response based on visual content
BROWSER: Streaming response displayed in real-time

Core Components

ColPali Model: Visual-language model for document understanding (LOCAL)
Vespa Search: Distributed search and storage engine (REMOTE)
FastHTML Frontend: Modern, responsive web interface (BROWSER)
Gemini Integration: AI-powered question answering (REMOTE API)
Similarity Map Generator: Visual attention visualization (LOCAL)

💻 System Requirements

LOCAL Machine Requirements (For AI Processing)

Minimum:

CPU: Modern multi-core processor (Intel/AMD/Apple Silicon)
RAM: 8GB+ (16GB recommended)
Storage: 10GB free space (for model cache)
Python: 3.10+ (< 3.13)

Recommended:

GPU: NVIDIA GPU with 8GB+ VRAM (RTX 3070/4060 or better)
Apple: M1/M2/M3 Mac (uses Metal Performance Shaders)
RAM: 16GB+ for smoother processing
Storage: SSD for faster model loading

Performance Examples:

RTX 4090: ~1-2 seconds per query
RTX 3070: ~3-5 seconds per query
Apple M2: ~4-6 seconds per query
CPU Only: ~15-30 seconds per query

REMOTE Requirements (Vespa Cloud)

What you need:

Vespa Cloud account (handles all remote processing)
Internet connection (for uploading embeddings and search queries)
Authentication tokens (provided by Vespa Cloud)

What Vespa Cloud provides:

Scalable storage for any number of documents
Sub-second search across millions of embeddings
High availability with automatic failover
Global CDN for fast access worldwide

💰 Cost Breakdown

FREE Components

ColPali Model: Open source, runs locally (no per-query costs)
Python Application: MIT/Apache licensed, completely free
Local Processing: Uses your own hardware (no cloud AI fees)

PAID Components

Vespa Cloud: Pay for storage and search operations
- ~$0.001 per 1000 searches
- ~$0.10 per GB storage per month
Google Gemini API: Optional, for chat features only
- ~$0.01 per 1000 image tokens
- Only used when you ask questions about documents

Cost Examples (Monthly)

Personal Use (100 documents, 1000 searches): ~$5-10/month
Small Business (1000 documents, 10k searches): ~$20-50/month
Enterprise (10k+ documents, 100k+ searches): $200+/month

💡 Cost Optimization Tips:

Use local Vespa installation to avoid cloud costs
Disable Gemini chat if not needed (saves API costs)
Process documents in batches to minimize upload time

🚀 Quick Start

Prerequisites

Python 3.10+ (< 3.13)
8GB+ RAM for ColPali model
Vespa Cloud account or local Vespa installation
Google Gemini API key (optional, for chat features)
GPU recommended but not required

1. Installation

# Clone the repository
git clone <repository-url>
cd colpali-vespa-visual-retrieval

# Install dependencies
pip install -e .

# For development
pip install -e ".[dev]"

# For document feeding capabilities
pip install -e ".[feed]"

2. Environment Configuration

Create a .env file with your configuration:

# Vespa Configuration
VESPA_APP_TOKEN_URL=https://your-app.vespa-cloud.com
VESPA_CLOUD_SECRET_TOKEN=your_secret_token

# Alternative: mTLS Authentication
USE_MTLS=false
VESPA_APP_MTLS_URL=https://your-app.vespa-cloud.com
VESPA_CLOUD_MTLS_KEY="-----BEGIN PRIVATE KEY-----..."
VESPA_CLOUD_MTLS_CERT="-----BEGIN CERTIFICATE-----..."

# Optional: Gemini AI (for chat features)
GEMINI_API_KEY=your_gemini_api_key

# Optional: Logging
LOG_LEVEL=INFO
HOT_RELOAD=false

3. Deploy Vespa Application

# Deploy the Vespa schema and configuration
python deploy_vespa_app.py \
  --tenant_name your_tenant \
  --vespa_application_name colpalidemo \
  --token_id_write colpalidemo_write \
  --token_id_read colpalidemo_read

4. Run the Application

python main.py

The application will be available at http://localhost:7860

📚 Document Management

Uploading Documents

Use the feeding script to process and upload PDF documents:

python feed_vespa.py \
  --application_name colpalidemo \
  --vespa_schema_name pdf_page

Document Processing Pipeline (LOCAL → REMOTE):

PDF Download (LOCAL): Your computer downloads PDFs from URLs
PDF Conversion (LOCAL): PDFs converted to images (one per page)
ColPali Processing (LOCAL): Each page processed by ColPali model on YOUR GPU/CPU
Embedding Generation (LOCAL): Visual embeddings created (1024 patches × 128 dimensions)
Binary Encoding (LOCAL): Embeddings converted to efficient binary format
Vespa Upload (REMOTE): Binary embeddings uploaded to Vespa Cloud
Search Indexing (REMOTE): Vespa indexes embeddings for fast retrieval

⚠️ Important Notes:

Processing Time: Expect 5-30 seconds per page depending on your hardware
Network Usage: Only final embeddings uploaded (~1KB per page vs ~1MB original)
Privacy: Original PDFs and images stay on your local machine
Storage: Raw images cached locally for similarity map generation

Supported Operations

✅ Upload Documents: Add new PDFs to the system
✅ Search Documents: Query existing documents
✅ View Documents: Browse stored documents
❌ Remove Documents: Not currently implemented
❌ Update Documents: Not currently implemented

🔐 Authentication & Security

🛡️ Current Security Implementation

SECURE Components:

Vespa Authentication (REMOTE)

Token Authentication: Bearer tokens for Vespa Cloud API access
mTLS Certificates: Mutual TLS for enterprise security
Encrypted Communication: HTTPS/TLS for all Vespa connections

API Key Management (LOCAL)

Environment Variables: Sensitive keys stored in .env files
API Key Rotation: Google Gemini supports key rotation
Local Storage: Keys never transmitted except to authorized APIs

LIMITED Security Components:

Session Management

# Basic UUID session tracking (FastHTML)
session["session_id"] = str(uuid.uuid4())

# HTTP-only cookies (Next.js)
cookieStore.set(SESSION_KEY, newSessionId, {
  httpOnly: true,
  secure: process.env.NODE_ENV === "production",
  sameSite: "lax",
  maxAge: 60 * 60 * 24 * 30, // 30 days
});

Basic Request Validation

# HTMX request validation
if "hx-request" not in request.headers:
    return RedirectResponse("/search")

# Parameter validation
if not query:
    return NextResponse.json({ error: "Query is required" }, { status: 400 });

⚠️ Security Limitations & Risks

MISSING Security Features:

❌ No API Authentication

Local API endpoints are completely open
No rate limiting or abuse protection
No user authentication or authorization
Anyone can access /fetch_results, /get_sim_map endpoints

❌ No Input Sanitization

# Raw user input passed directly to models
query = searchParams.get("query")  # No validation/sanitization
ranking = searchParams.get("ranking")  # No input filtering

❌ No Security Headers

No CORS configuration
No Content Security Policy (CSP)
No X-Frame-Options protection
No X-Content-Type-Options validation

❌ No Rate Limiting

Unlimited API requests
No protection against DoS attacks
No query throttling or user limits

❌ No CSRF Protection

No token validation for state-changing operations
Cross-site request forgery possible

🎯 Security Recommendations

IMMEDIATE (High Priority)

1. Add API Authentication

// middleware.ts - Add API key validation
export function middleware(request: NextRequest) {
  const apiKey = request.headers.get("X-API-Key");
  if (!apiKey || apiKey !== process.env.COLPALI_API_KEY) {
    return new Response("Unauthorized", { status: 401 });
  }
}

2. Implement Rate Limiting

// Use next-rate-limit or similar
import rateLimit from "@/lib/rate-limit";

const limiter = rateLimit({
  interval: 60 * 1000, // 1 minute
  uniqueTokenPerInterval: 500, // Limit each IP to 100 requests per interval
});

await limiter.check(10, getClientIP(request)); // 10 requests per minute

3. Add Security Headers

// next.config.js
const securityHeaders = [
  { key: "X-Frame-Options", value: "DENY" },
  { key: "X-Content-Type-Options", value: "nosniff" },
  { key: "Referrer-Policy", value: "strict-origin-when-cross-origin" },
  {
    key: "Content-Security-Policy",
    value: "default-src 'self'; script-src 'self' 'unsafe-inline'",
  },
];

4. Input Validation & Sanitization

import { z } from "zod";

const SearchSchema = z.object({
  query: z
    .string()
    .min(1)
    .max(500)
    .regex(/^[a-zA-Z0-9\s\.\?\!]*$/),
  ranking: z.enum(["hybrid", "colpali", "bm25"]),
});

MEDIUM Priority

5. CORS Configuration

// Restrict origins to known domains
const corsHeaders = {
  "Access-Control-Allow-Origin": "https://yourdomain.com",
  "Access-Control-Allow-Methods": "GET, POST, OPTIONS",
  "Access-Control-Allow-Headers": "Content-Type, Authorization",
};

6. Request Size Limits

// Limit request payload sizes
export const config = {
  api: {
    bodyParser: {
      sizeLimit: "1mb",
    },
  },
};

7. Audit Logging

# Log all API access with IP, timestamp, and queries
logger.info(f"API_ACCESS: {client_ip} - {endpoint} - {query[:100]}")

LONG-TERM (Production Ready)

8. User Authentication (Optional)

// Add NextAuth.js or similar for user accounts
// Implement role-based access control
// Add document ownership and permissions

9. Network Security

# Deploy behind reverse proxy (nginx/cloudflare)
# Enable DDoS protection
# Use Web Application Firewall (WAF)

10. Data Privacy Controls

// Implement data retention policies
// Add user data deletion capabilities
// GDPR compliance features

🔒 Security Best Practices

For LOCAL Development:

Never commit API keys to version control
Use strong environment variable names (avoid API_KEY)
Rotate API keys regularly (monthly)
Enable firewall on development machines
Use HTTPS even locally for production testing

For PRODUCTION Deployment:

Deploy behind CDN/WAF (Cloudflare, AWS Shield)
Enable rate limiting at infrastructure level
Use container security scanning
Implement monitoring and alerting
Regular security audits and penetration testing

For REMOTE Services:

Vespa Cloud: Follows enterprise security standards
Gemini API: Google-managed security and compliance
Environment Isolation: Separate dev/staging/prod credentials

🚨 Current Risk Level: MEDIUM

Suitable for:

✅ Personal projects and demos
✅ Internal company tools (behind firewall)
✅ Research and development environments

NOT suitable for:

❌ Public internet deployment
❌ Customer-facing applications
❌ Production environments with sensitive data
❌ Commercial applications without security hardening

🎯 Usage Guide

Basic Search

Navigate to the homepage
Enter your search query in natural language
Select ranking method (hybrid, semantic, etc.)
View results with similarity maps

Similarity Maps

Click on token buttons to see which parts of documents match specific query terms
Visual heatmaps show attention patterns
Reset button returns to original document view

AI Chat

Ask questions about retrieved documents
Chat responses are based on document content
Streaming responses for real-time interaction

Search Rankings

Hybrid: Combines multiple ranking signals
Semantic: Pure semantic similarity
BM25: Traditional text-based ranking
ColPali: Visual-first ranking

🛠️ Development

Project Structure

├── main.py                 # Application entry point
├── backend/
│   ├── colpali.py         # ColPali model integration
│   ├── vespa_app.py       # Vespa client and queries
│   └── modelmanager.py    # Model management utilities
├── frontend/
│   ├── app.py             # UI components
│   └── layout.py          # Layout templates
├── feed_vespa.py          # Document upload script
├── deploy_vespa_app.py    # Vespa deployment script
├── colpali-with-snippets/ # Vespa schema definitions
└── static/                # Static assets and generated files

Running in Development

# Enable hot reload
export HOT_RELOAD=true
python main.py

# Or set in .env
echo "HOT_RELOAD=true" >> .env

Code Quality

# Format code
ruff format .

# Lint code
ruff check .

📊 API Endpoints

Current API Routes (⚠️ UNSECURED)

Endpoint	Method	Description	Security Status
`/`	GET	Homepage	✅ Public (safe)
`/search`	GET	Search interface	✅ Public (safe)
`/fetch_results`	GET	Fetch search results	⚠️ OPEN API
`/get_sim_map`	GET	Get similarity maps	⚠️ OPEN API
`/get-message`	GET	Chat with AI (SSE)	⚠️ OPEN API
`/full_image`	GET	Get full document image	⚠️ OPEN API
`/suggestions`	GET	Query autocomplete	⚠️ OPEN API
`/static/*`	GET	Static file serving	✅ Public (safe)

Security Analysis by Endpoint

🔒 SECURE Endpoints

/ and /search: Static HTML pages, no sensitive data
/static/*: Public assets (CSS, JS, images)

⚠️ UNSECURED Endpoints (Risk)

/fetch_results - HIGH RISK

# Anyone can perform unlimited searches
curl "http://localhost:7860/fetch_results?query=secret&ranking=hybrid"

Risks: Resource abuse, server overload, competitive intelligence gathering
Exposes: Search capabilities, document metadata, processing times

/get_sim_map - MEDIUM RISK

# Access similarity maps without authentication
curl "http://localhost:7860/get_sim_map?query_id=123&idx=0&token=word&token_idx=5"

Risks: Unauthorized access to visual analysis
Exposes: Document visual patterns, query insights

/get-message - HIGH RISK

# Trigger AI processing without limits
curl "http://localhost:7860/get-message?query_id=123&query=question&doc_ids=doc1,doc2"

Risks: Gemini API abuse, cost exploitation, resource exhaustion
Exposes: AI-generated insights, document content analysis

/full_image - HIGH RISK

# Download any document image
curl "http://localhost:7860/full_image?doc_id=any_document_id"

Risks: Unauthorized document access, data leakage
Exposes: Full document images, potentially sensitive content

Immediate Security Fixes

1. Add API Key Authentication

# Python FastHTML middleware
@app.middleware("http")
async def verify_api_key(request, call_next):
    if request.url.path.startswith("/fetch_results"):
        api_key = request.headers.get("X-API-Key")
        if not api_key or api_key != os.getenv("COLPALI_API_KEY"):
            return JSONResponse({"error": "Unauthorized"}, status_code=401)
    return await call_next(request)

2. Implement Rate Limiting

from slowapi import Limiter, _rate_limit_exceeded_handler
from slowapi.util import get_remote_address

limiter = Limiter(key_func=get_remote_address)

@rt("/fetch_results")
@limiter.limit("10/minute")  # 10 requests per minute per IP
async def get_results(request, query: str, ranking: str):
    # ... existing code

3. Input Validation

from pydantic import BaseModel, validator

class SearchRequest(BaseModel):
    query: str
    ranking: str

    @validator('query')
    def query_must_be_safe(cls, v):
        if len(v) > 500:
            raise ValueError('Query too long')
        # Add sanitization logic
        return v.strip()

4. Request Origin Validation

ALLOWED_ORIGINS = ["http://localhost:3000", "https://yourdomain.com"]

@app.middleware("http")
async def cors_middleware(request, call_next):
    origin = request.headers.get("origin")
    if origin not in ALLOWED_ORIGINS:
        return JSONResponse({"error": "Forbidden"}, status_code=403)
    return await call_next(request)

📈 Recommended API Security Architecture

┌─────────────────┐    ┌─────────────────┐    ┌─────────────────┐
│   Frontend      │    │  Rate Limiter   │    │   Backend API   │
│                 │    │                 │    │                 │
│ • API Key       │◄──►│ • IP Limiting   │◄──►│ • Input Valid.  │
│ • CORS Headers  │    │ • User Quotas   │    │ • Auth Checks   │
│ • Request Valid.│    │ • DoS Protection│    │ • Audit Logs    │
└─────────────────┘    └─────────────────┘    └─────────────────┘

Benefits:

Layer 1: Frontend validates requests before sending
Layer 2: Rate limiter prevents abuse and DoS attacks
Layer 3: Backend performs final validation and authorization

🔒 Security Implementation Checklist

Before Production Deployment:

CRITICAL (Must Do):

Generate API Key: Create strong API key for endpoint authentication
Enable Rate Limiting: Implement per-IP request limits
Add Security Headers: X-Frame-Options, CSP, X-Content-Type-Options
Input Validation: Sanitize all user inputs (query, ranking)
CORS Configuration: Restrict origins to known domains only
Environment Security: Never commit API keys, use secure .env
HTTPS Only: Force TLS in production (no HTTP)

HIGH Priority:

Audit Logging: Log all API requests with IP and timestamp
Request Size Limits: Prevent large payload attacks
Error Handling: Don't expose stack traces or internal details
Session Security: HTTP-only, secure, SameSite cookies
API Documentation: Document authentication requirements

MEDIUM Priority:

User Authentication: Consider adding user accounts for access control
Request Timeout: Prevent long-running request abuse
Content Validation: Verify response content types
Monitoring: Set up alerts for unusual API usage patterns
Backup Strategy: Secure backup of environment variables

Security Testing Commands:

Test API Authentication:

# Should fail without API key
curl "http://localhost:7860/fetch_results?query=test&ranking=hybrid"

# Should succeed with API key
curl -H "X-API-Key: your_api_key" "http://localhost:7860/fetch_results?query=test&ranking=hybrid"

Test Rate Limiting:

# Run multiple requests to trigger rate limit
for i in {1..15}; do
  curl -H "X-API-Key: your_api_key" "http://localhost:7860/fetch_results?query=test$i&ranking=hybrid"
  echo "Request $i"
done

Test Input Validation:

# Should reject invalid/malicious inputs
curl -H "X-API-Key: your_api_key" "http://localhost:7860/fetch_results?query=<script>alert('xss')</script>&ranking=invalid"

Test Security Headers:

# Check security headers in response
curl -I "http://localhost:7860/"
# Should see: X-Frame-Options, X-Content-Type-Options, etc.

Security Monitoring:

Log Analysis Queries:

# Monitor API usage patterns
grep "API_ACCESS" /var/log/colpali.log | tail -100

# Detect potential abuse
grep "RATE_LIMIT_EXCEEDED" /var/log/colpali.log

# Check authentication failures
grep "UNAUTHORIZED" /var/log/colpali.log

Alerting Setup:

Rate Limit Violations: Alert when >50 requests/minute from single IP
Authentication Failures: Alert on repeated unauthorized attempts
Unusual Queries: Alert on suspicious query patterns or injection attempts
Resource Usage: Alert on high CPU/memory usage (potential DoS)

🧪 Models Used

ColPali v1.2: Visual document understanding
ColPaliGemma 3B: Base visual-language model
Google Gemini 2.0: AI chat and question answering

🔧 Configuration Options

Environment Variables

Variable	Required	Description	Security Impact
`VESPA_APP_TOKEN_URL`	Yes*	Vespa application URL (token auth)	HIGH - Remote access
`VESPA_CLOUD_SECRET_TOKEN`	Yes*	Vespa secret token	CRITICAL - Full database access
`USE_MTLS`	No	Use mTLS instead of token auth	MEDIUM - Auth method
`VESPA_APP_MTLS_URL`	Yes**	Vespa application URL (mTLS)	HIGH - Remote access
`VESPA_CLOUD_MTLS_KEY`	Yes**	mTLS private key	CRITICAL - TLS credentials
`VESPA_CLOUD_MTLS_CERT`	Yes**	mTLS certificate	HIGH - TLS credentials
`GEMINI_API_KEY`	No	Google Gemini API key	HIGH - AI access/costs
`LOG_LEVEL`	No	Logging level (DEBUG, INFO, WARNING, ERROR)	LOW - Debug info
`HOT_RELOAD`	No	Enable hot reload in development	LOW - Dev convenience

🔒 Security-Related Environment Variables (Recommended)

Variable	Required	Description	Default
`COLPALI_API_KEY`	YES*	API key for endpoint authentication	None
`ALLOWED_ORIGINS`	YES*	Comma-separated allowed CORS origins	None
`RATE_LIMIT_REQUESTS`	No	Max requests per minute per IP	`10`
`RATE_LIMIT_WINDOW`	No	Rate limit window in seconds	`60`
`MAX_QUERY_LENGTH`	No	Maximum query string length	`500`
`ENABLE_AUDIT_LOGGING`	No	Log all API requests for security	`false`
`SECURITY_HEADERS_ENABLED`	No	Enable security headers	`true`
`CSRF_SECRET`	YES*	Secret for CSRF token generation	None

Example Security-Enhanced .env:

# Existing configuration
VESPA_APP_TOKEN_URL=https://your-app.vespa-cloud.com
VESPA_CLOUD_SECRET_TOKEN=your_vespa_secret_token
GEMINI_API_KEY=your_gemini_api_key

# NEW: Security configuration
COLPALI_API_KEY=your_strong_random_api_key_here
ALLOWED_ORIGINS=http://localhost:3000,https://yourdomain.com
RATE_LIMIT_REQUESTS=10
RATE_LIMIT_WINDOW=60
MAX_QUERY_LENGTH=500
ENABLE_AUDIT_LOGGING=true
SECURITY_HEADERS_ENABLED=true
CSRF_SECRET=your_random_csrf_secret_here

# Development vs Production
NODE_ENV=production  # Enable secure cookies
LOG_LEVEL=INFO       # Don't expose debug info in production

*Required for token authentication
**Required for mTLS authentication
***Required for production security

🚨 Troubleshooting

LOCAL Processing Issues

ColPali model fails to load:

# Check GPU memory
nvidia-smi  # For NVIDIA GPUs
# or
system_profiler SPDisplaysDataType  # For Apple Silicon

# Clear model cache if corrupted
rm -rf ~/.cache/huggingface/hub/models--vidore--colpali-v1.2

Out of memory errors:

Reduce batch size in feed_vespa.py (try batch_size=1)
Close other applications to free RAM/VRAM
Use CPU processing if GPU memory insufficient: CUDA_VISIBLE_DEVICES="" python main.py

Slow processing on CPU:

Expected behavior - ColPali requires significant computation
Consider upgrading to GPU or Apple Silicon for 5-10x speedup
Process documents overnight for large collections

REMOTE Processing Issues

Connection to Vespa fails:

Verify your Vespa URL and credentials in .env
Check if the Vespa application is deployed and running
Ensure network connectivity: ping your-app.vespa-cloud.com
Validate authentication tokens haven't expired

Document upload fails:

Check Vespa Cloud storage quota and billing
Verify embedding format matches Vespa schema
Ensure stable internet connection for large uploads

Search returns no results:

Confirm documents were successfully uploaded to Vespa
Check if embeddings were properly indexed
Verify query processing isn't failing locally

MIXED (Local + Remote) Issues

Chat features don't work:

LOCAL: Verify document images are being generated locally
REMOTE: Check GEMINI_API_KEY is set correctly
REMOTE: Verify Gemini API quota and billing
NETWORK: Ensure images can be sent to Gemini API

Similarity maps missing:

LOCAL: Confirm ColPali model loaded successfully
LOCAL: Check if similarity map generation completed
REMOTE: Verify Vespa returned similarity data
BROWSER: Clear browser cache for static files

Performance Tips

LOCAL Optimization:

Use GPU acceleration for 5-10x faster model inference
Optimize batch sizes based on available memory
Use SSD storage for faster model loading
Consider quantized models for lower memory usage

REMOTE Optimization:

Use Vespa's HNSW indexing for faster search
Optimize embedding dimensions vs accuracy tradeoff
Enable compression for faster network transfer
Use multiple Vespa instances for high availability

NETWORK Optimization:

Process documents in batches to reduce upload overhead
Use compression for embedding transfer
Consider regional Vespa deployment for lower latency

📄 License

Apache-2.0

🤝 Contributing

Fork the repository
Create a feature branch
Make your changes
Run tests and linting
Submit a pull request

📞 Support

For issues and questions:

Check the troubleshooting section
Review Vespa and ColPali documentation
Open an issue on the repository