Spaces:

vk98
/

colpali-visual-retrieval

Build error

File size: 31,352 Bytes

a54266b

# ColPali 🤝 Vespa - Visual Retrieval System

A powerful visual document retrieval system that combines **ColPali** (Contextual Late Interaction with Patch-level Information) with **Vespa** for scalable, intelligent document search and question-answering.

## 🌟 Features

### 🔍 **Visual Document Search**

- **Multi-modal retrieval**: Search through PDF documents using natural language queries
- **Visual understanding**: ColPali model processes document images and text simultaneously
- **Token-level similarity maps**: Visualize exactly which parts of documents match your query
- **Multiple ranking algorithms**: Choose between hybrid, semantic, and other ranking methods

### 🧠 **AI-Powered Chat**

- **Intelligent Q&A**: Ask questions about retrieved documents using Google Gemini 2.0
- **Context-aware responses**: AI analyzes document images to provide accurate answers
- **Real-time streaming**: Get responses as they're generated

### ⚡ **Scalable Infrastructure**

- **Vespa integration**: Enterprise-grade search platform for large document collections
- **Real-time processing**: Instant search results and similarity map generation
- **Cloud-ready**: Supports Vespa Cloud deployment with secure authentication

## 🏗️ Architecture

```
┌─────────────────┐    ┌─────────────────┐    ┌─────────────────┐
│   Frontend      │    │    Backend      │    │   Vespa Cloud   │
│   (Browser)     │    │   (Your Local   │    │   (Remote)      │
│                 │    │    Computer)    │    │                 │
│ • Search UI     │◄──►│ • ColPali Model │◄──►│ • Document Store│
│ • Similarity    │    │ • Query Proc.   │    │ • Vector Search │
│   Maps          │    │ • Sim Map Gen.  │    │ • Ranking       │
│ • Chat Interface│    │ • Gemini Int.   │    │                 │
└─────────────────┘    └─────────────────┘    └─────────────────┘
        ↑                        ↑                        ↑
   Web Browser              LOCAL AI               REMOTE Storage
```

### 🏠 **LOCAL Processing (Your Computer)**

**All AI model inference happens on YOUR local machine:**

- **ColPali Model**: Runs locally on your GPU/CPU (~7GB model)
- **Document Processing**: PDF → Images → Embeddings (local)
- **Query Processing**: Text → Embeddings (local)
- **Similarity Maps**: Visual attention generation (local)
- **Gemini Chat**: Processes retrieved images locally

**Device Detection:**

```python
device = get_torch_device("auto")  # Detects: CUDA, MPS (Apple), or CPU
print(f"Using device: {device}")   # Shows YOUR hardware
```

### ☁️ **REMOTE Processing (Vespa Cloud)**

**Only storage and search index operations happen remotely:**

- **Document Storage**: Stores processed embeddings (not raw models)
- **Vector Search**: Fast similarity search across document collection
- **Query Routing**: Handles search requests and ranking
- **Metadata Storage**: Document titles, URLs, page numbers

### 🔄 **Complete Data Flow**

#### **Document Upload Process:**

1. **LOCAL**: Your computer downloads PDF from URL
2. **LOCAL**: ColPali converts PDF pages to images
3. **LOCAL**: ColPali generates visual embeddings (1024 patches × 128 dims)
4. **LOCAL**: Embeddings converted to binary format for efficiency
5. **REMOTE**: Binary embeddings uploaded to Vespa Cloud
6. **REMOTE**: Vespa indexes embeddings for fast search

#### **Search Query Process:**

1. **LOCAL**: You enter search query in browser
2. **LOCAL**: ColPali processes query → generates query embeddings
3. **REMOTE**: Query embeddings sent to Vespa Cloud
4. **REMOTE**: Vespa searches document index, returns matches
5. **LOCAL**: ColPali generates similarity maps for results
6. **BROWSER**: Results displayed with visual attention maps

#### **AI Chat Process:**

1. **LOCAL**: Retrieved document images processed by your machine
2. **REMOTE**: Images + query sent to Google Gemini API
3. **REMOTE**: Gemini generates response based on visual content
4. **BROWSER**: Streaming response displayed in real-time

### Core Components

- **ColPali Model**: Visual-language model for document understanding (LOCAL)
- **Vespa Search**: Distributed search and storage engine (REMOTE)
- **FastHTML Frontend**: Modern, responsive web interface (BROWSER)
- **Gemini Integration**: AI-powered question answering (REMOTE API)
- **Similarity Map Generator**: Visual attention visualization (LOCAL)

## 💻 **System Requirements**

### **LOCAL Machine Requirements (For AI Processing)**

**Minimum:**

- **CPU**: Modern multi-core processor (Intel/AMD/Apple Silicon)
- **RAM**: 8GB+ (16GB recommended)
- **Storage**: 10GB free space (for model cache)
- **Python**: 3.10+ (< 3.13)

**Recommended:**

- **GPU**: NVIDIA GPU with 8GB+ VRAM (RTX 3070/4060 or better)
- **Apple**: M1/M2/M3 Mac (uses Metal Performance Shaders)
- **RAM**: 16GB+ for smoother processing
- **Storage**: SSD for faster model loading

**Performance Examples:**

- **RTX 4090**: ~1-2 seconds per query
- **RTX 3070**: ~3-5 seconds per query
- **Apple M2**: ~4-6 seconds per query
- **CPU Only**: ~15-30 seconds per query

### **REMOTE Requirements (Vespa Cloud)**

**What you need:**

- **Vespa Cloud account** (handles all remote processing)
- **Internet connection** (for uploading embeddings and search queries)
- **Authentication tokens** (provided by Vespa Cloud)

**What Vespa Cloud provides:**

- **Scalable storage** for any number of documents
- **Sub-second search** across millions of embeddings
- **High availability** with automatic failover
- **Global CDN** for fast access worldwide

## 💰 **Cost Breakdown**

### **FREE Components**

- **ColPali Model**: Open source, runs locally (no per-query costs)
- **Python Application**: MIT/Apache licensed, completely free
- **Local Processing**: Uses your own hardware (no cloud AI fees)

### **PAID Components**

- **Vespa Cloud**: Pay for storage and search operations
  - ~$0.001 per 1000 searches
  - ~$0.10 per GB storage per month
- **Google Gemini API**: Optional, for chat features only
  - ~$0.01 per 1000 image tokens
  - Only used when you ask questions about documents

### **Cost Examples (Monthly)**

- **Personal Use** (100 documents, 1000 searches): ~$5-10/month
- **Small Business** (1000 documents, 10k searches): ~$20-50/month
- **Enterprise** (10k+ documents, 100k+ searches): $200+/month

**💡 Cost Optimization Tips:**

- Use local Vespa installation to avoid cloud costs
- Disable Gemini chat if not needed (saves API costs)
- Process documents in batches to minimize upload time

## 🚀 Quick Start

### Prerequisites

- Python 3.10+ (< 3.13)
- **8GB+ RAM** for ColPali model
- **Vespa Cloud account** or local Vespa installation
- **Google Gemini API key** (optional, for chat features)
- **GPU recommended** but not required

### 1. Installation

```bash
# Clone the repository
git clone <repository-url>
cd colpali-vespa-visual-retrieval

# Install dependencies
pip install -e .

# For development
pip install -e ".[dev]"

# For document feeding capabilities
pip install -e ".[feed]"
```

### 2. Environment Configuration

Create a `.env` file with your configuration:

```bash
# Vespa Configuration
VESPA_APP_TOKEN_URL=https://your-app.vespa-cloud.com
VESPA_CLOUD_SECRET_TOKEN=your_secret_token

# Alternative: mTLS Authentication
USE_MTLS=false
VESPA_APP_MTLS_URL=https://your-app.vespa-cloud.com
VESPA_CLOUD_MTLS_KEY="-----BEGIN PRIVATE KEY-----..."
VESPA_CLOUD_MTLS_CERT="-----BEGIN CERTIFICATE-----..."

# Optional: Gemini AI (for chat features)
GEMINI_API_KEY=your_gemini_api_key

# Optional: Logging
LOG_LEVEL=INFO
HOT_RELOAD=false
```

### 3. Deploy Vespa Application

```bash
# Deploy the Vespa schema and configuration
python deploy_vespa_app.py \
  --tenant_name your_tenant \
  --vespa_application_name colpalidemo \
  --token_id_write colpalidemo_write \
  --token_id_read colpalidemo_read
```

### 4. Run the Application

```bash
python main.py
```

The application will be available at `http://localhost:7860`

## 📚 Document Management

### Uploading Documents

Use the feeding script to process and upload PDF documents:

```bash
python feed_vespa.py \
  --application_name colpalidemo \
  --vespa_schema_name pdf_page
```

**Document Processing Pipeline (LOCAL → REMOTE):**

1. **PDF Download** (LOCAL): Your computer downloads PDFs from URLs
2. **PDF Conversion** (LOCAL): PDFs converted to images (one per page)
3. **ColPali Processing** (LOCAL): Each page processed by ColPali model on YOUR GPU/CPU
4. **Embedding Generation** (LOCAL): Visual embeddings created (1024 patches × 128 dimensions)
5. **Binary Encoding** (LOCAL): Embeddings converted to efficient binary format
6. **Vespa Upload** (REMOTE): Binary embeddings uploaded to Vespa Cloud
7. **Search Indexing** (REMOTE): Vespa indexes embeddings for fast retrieval

**⚠️ Important Notes:**

- **Processing Time**: Expect 5-30 seconds per page depending on your hardware
- **Network Usage**: Only final embeddings uploaded (~1KB per page vs ~1MB original)
- **Privacy**: Original PDFs and images stay on your local machine
- **Storage**: Raw images cached locally for similarity map generation

### Supported Operations

- ✅ **Upload Documents**: Add new PDFs to the system
- ✅ **Search Documents**: Query existing documents
- ✅ **View Documents**: Browse stored documents
- ❌ **Remove Documents**: _Not currently implemented_
- ❌ **Update Documents**: _Not currently implemented_

## 🔐 Authentication & Security

### 🛡️ **Current Security Implementation**

#### **SECURE Components:**

**Vespa Authentication (REMOTE)**

- **Token Authentication**: Bearer tokens for Vespa Cloud API access
- **mTLS Certificates**: Mutual TLS for enterprise security
- **Encrypted Communication**: HTTPS/TLS for all Vespa connections

**API Key Management (LOCAL)**

- **Environment Variables**: Sensitive keys stored in `.env` files
- **API Key Rotation**: Google Gemini supports key rotation
- **Local Storage**: Keys never transmitted except to authorized APIs

#### **LIMITED Security Components:**

**Session Management**

```python
# Basic UUID session tracking (FastHTML)
session["session_id"] = str(uuid.uuid4())

# HTTP-only cookies (Next.js)
cookieStore.set(SESSION_KEY, newSessionId, {
  httpOnly: true,
  secure: process.env.NODE_ENV === "production",
  sameSite: "lax",
  maxAge: 60 * 60 * 24 * 30, // 30 days
});
```

**Basic Request Validation**

```python
# HTMX request validation
if "hx-request" not in request.headers:
    return RedirectResponse("/search")

# Parameter validation
if not query:
    return NextResponse.json({ error: "Query is required" }, { status: 400 });
```

### ⚠️ **Security Limitations & Risks**

#### **MISSING Security Features:**

**❌ No API Authentication**

- Local API endpoints are **completely open**
- No rate limiting or abuse protection
- No user authentication or authorization
- Anyone can access `/fetch_results`, `/get_sim_map` endpoints

**❌ No Input Sanitization**

```python
# Raw user input passed directly to models
query = searchParams.get("query")  # No validation/sanitization
ranking = searchParams.get("ranking")  # No input filtering
```

**❌ No Security Headers**

- No CORS configuration
- No Content Security Policy (CSP)
- No X-Frame-Options protection
- No X-Content-Type-Options validation

**❌ No Rate Limiting**

- Unlimited API requests
- No protection against DoS attacks
- No query throttling or user limits

**❌ No CSRF Protection**

- No token validation for state-changing operations
- Cross-site request forgery possible

### 🎯 **Security Recommendations**

#### **IMMEDIATE (High Priority)**

**1. Add API Authentication**

```typescript
// middleware.ts - Add API key validation
export function middleware(request: NextRequest) {
  const apiKey = request.headers.get("X-API-Key");
  if (!apiKey || apiKey !== process.env.COLPALI_API_KEY) {
    return new Response("Unauthorized", { status: 401 });
  }
}
```

**2. Implement Rate Limiting**

```typescript
// Use next-rate-limit or similar
import rateLimit from "@/lib/rate-limit";

const limiter = rateLimit({
  interval: 60 * 1000, // 1 minute
  uniqueTokenPerInterval: 500, // Limit each IP to 100 requests per interval
});

await limiter.check(10, getClientIP(request)); // 10 requests per minute
```

**3. Add Security Headers**

```typescript
// next.config.js
const securityHeaders = [
  { key: "X-Frame-Options", value: "DENY" },
  { key: "X-Content-Type-Options", value: "nosniff" },
  { key: "Referrer-Policy", value: "strict-origin-when-cross-origin" },
  {
    key: "Content-Security-Policy",
    value: "default-src 'self'; script-src 'self' 'unsafe-inline'",
  },
];
```

**4. Input Validation & Sanitization**

```typescript
import { z } from "zod";

const SearchSchema = z.object({
  query: z
    .string()
    .min(1)
    .max(500)
    .regex(/^[a-zA-Z0-9\s\.\?\!]*$/),
  ranking: z.enum(["hybrid", "colpali", "bm25"]),
});
```

#### **MEDIUM Priority**

**5. CORS Configuration**

```typescript
// Restrict origins to known domains
const corsHeaders = {
  "Access-Control-Allow-Origin": "https://yourdomain.com",
  "Access-Control-Allow-Methods": "GET, POST, OPTIONS",
  "Access-Control-Allow-Headers": "Content-Type, Authorization",
};
```

**6. Request Size Limits**

```typescript
// Limit request payload sizes
export const config = {
  api: {
    bodyParser: {
      sizeLimit: "1mb",
    },
  },
};
```

**7. Audit Logging**

```python
# Log all API access with IP, timestamp, and queries
logger.info(f"API_ACCESS: {client_ip} - {endpoint} - {query[:100]}")
```

#### **LONG-TERM (Production Ready)**

**8. User Authentication (Optional)**

```typescript
// Add NextAuth.js or similar for user accounts
// Implement role-based access control
// Add document ownership and permissions
```

**9. Network Security**

```bash
# Deploy behind reverse proxy (nginx/cloudflare)
# Enable DDoS protection
# Use Web Application Firewall (WAF)
```

**10. Data Privacy Controls**

```typescript
// Implement data retention policies
// Add user data deletion capabilities
// GDPR compliance features
```

### 🔒 **Security Best Practices**

#### **For LOCAL Development:**

- **Never commit API keys** to version control
- **Use strong environment variable names** (avoid `API_KEY`)
- **Rotate API keys regularly** (monthly)
- **Enable firewall** on development machines
- **Use HTTPS even locally** for production testing

#### **For PRODUCTION Deployment:**

- **Deploy behind CDN/WAF** (Cloudflare, AWS Shield)
- **Enable rate limiting** at infrastructure level
- **Use container security scanning**
- **Implement monitoring and alerting**
- **Regular security audits and penetration testing**

#### **For REMOTE Services:**

- **Vespa Cloud**: Follows enterprise security standards
- **Gemini API**: Google-managed security and compliance
- **Environment Isolation**: Separate dev/staging/prod credentials

### 🚨 **Current Risk Level: MEDIUM**

**Suitable for:**

- ✅ **Personal projects and demos**
- ✅ **Internal company tools** (behind firewall)
- ✅ **Research and development** environments

**NOT suitable for:**

- ❌ **Public internet deployment**
- ❌ **Customer-facing applications**
- ❌ **Production environments** with sensitive data
- ❌ **Commercial applications** without security hardening

## 🎯 Usage Guide

### Basic Search

1. Navigate to the homepage
2. Enter your search query in natural language
3. Select ranking method (hybrid, semantic, etc.)
4. View results with similarity maps

### Similarity Maps

- Click on token buttons to see which parts of documents match specific query terms
- Visual heatmaps show attention patterns
- Reset button returns to original document view

### AI Chat

- Ask questions about retrieved documents
- Chat responses are based on document content
- Streaming responses for real-time interaction

### Search Rankings

- **Hybrid**: Combines multiple ranking signals
- **Semantic**: Pure semantic similarity
- **BM25**: Traditional text-based ranking
- **ColPali**: Visual-first ranking

## 🛠️ Development

### Project Structure

```
├── main.py                 # Application entry point
├── backend/
│   ├── colpali.py         # ColPali model integration
│   ├── vespa_app.py       # Vespa client and queries
│   └── modelmanager.py    # Model management utilities
├── frontend/
│   ├── app.py             # UI components
│   └── layout.py          # Layout templates
├── feed_vespa.py          # Document upload script
├── deploy_vespa_app.py    # Vespa deployment script
├── colpali-with-snippets/ # Vespa schema definitions
└── static/                # Static assets and generated files
```

### Running in Development

```bash
# Enable hot reload
export HOT_RELOAD=true
python main.py

# Or set in .env
echo "HOT_RELOAD=true" >> .env
```

### Code Quality

```bash
# Format code
ruff format .

# Lint code
ruff check .
```

## 📊 API Endpoints

### **Current API Routes (⚠️ UNSECURED)**

| Endpoint         | Method | Description             | Security Status  |
| ---------------- | ------ | ----------------------- | ---------------- |
| `/`              | GET    | Homepage                | ✅ Public (safe) |
| `/search`        | GET    | Search interface        | ✅ Public (safe) |
| `/fetch_results` | GET    | Fetch search results    | ⚠️ **OPEN API**  |
| `/get_sim_map`   | GET    | Get similarity maps     | ⚠️ **OPEN API**  |
| `/get-message`   | GET    | Chat with AI (SSE)      | ⚠️ **OPEN API**  |
| `/full_image`    | GET    | Get full document image | ⚠️ **OPEN API**  |
| `/suggestions`   | GET    | Query autocomplete      | ⚠️ **OPEN API**  |
| `/static/*`      | GET    | Static file serving     | ✅ Public (safe) |

### **Security Analysis by Endpoint**

#### **🔒 SECURE Endpoints**

- **`/`** and **`/search`**: Static HTML pages, no sensitive data
- **`/static/*`**: Public assets (CSS, JS, images)

#### **⚠️ UNSECURED Endpoints (Risk)**

**`/fetch_results`** - **HIGH RISK**

```bash
# Anyone can perform unlimited searches
curl "http://localhost:7860/fetch_results?query=secret&ranking=hybrid"
```

- **Risks**: Resource abuse, server overload, competitive intelligence gathering
- **Exposes**: Search capabilities, document metadata, processing times

**`/get_sim_map`** - **MEDIUM RISK**

```bash
# Access similarity maps without authentication
curl "http://localhost:7860/get_sim_map?query_id=123&idx=0&token=word&token_idx=5"
```

- **Risks**: Unauthorized access to visual analysis
- **Exposes**: Document visual patterns, query insights

**`/get-message`** - **HIGH RISK**

```bash
# Trigger AI processing without limits
curl "http://localhost:7860/get-message?query_id=123&query=question&doc_ids=doc1,doc2"
```

- **Risks**: Gemini API abuse, cost exploitation, resource exhaustion
- **Exposes**: AI-generated insights, document content analysis

**`/full_image`** - **HIGH RISK**

```bash
# Download any document image
curl "http://localhost:7860/full_image?doc_id=any_document_id"
```

- **Risks**: Unauthorized document access, data leakage
- **Exposes**: Full document images, potentially sensitive content

### **Immediate Security Fixes**

#### **1. Add API Key Authentication**

```python
# Python FastHTML middleware
@app.middleware("http")
async def verify_api_key(request, call_next):
    if request.url.path.startswith("/fetch_results"):
        api_key = request.headers.get("X-API-Key")
        if not api_key or api_key != os.getenv("COLPALI_API_KEY"):
            return JSONResponse({"error": "Unauthorized"}, status_code=401)
    return await call_next(request)
```

#### **2. Implement Rate Limiting**

```python
from slowapi import Limiter, _rate_limit_exceeded_handler
from slowapi.util import get_remote_address

limiter = Limiter(key_func=get_remote_address)

@rt("/fetch_results")
@limiter.limit("10/minute")  # 10 requests per minute per IP
async def get_results(request, query: str, ranking: str):
    # ... existing code
```

#### **3. Input Validation**

```python
from pydantic import BaseModel, validator

class SearchRequest(BaseModel):
    query: str
    ranking: str

    @validator('query')
    def query_must_be_safe(cls, v):
        if len(v) > 500:
            raise ValueError('Query too long')
        # Add sanitization logic
        return v.strip()
```

#### **4. Request Origin Validation**

```python
ALLOWED_ORIGINS = ["http://localhost:3000", "https://yourdomain.com"]

@app.middleware("http")
async def cors_middleware(request, call_next):
    origin = request.headers.get("origin")
    if origin not in ALLOWED_ORIGINS:
        return JSONResponse({"error": "Forbidden"}, status_code=403)
    return await call_next(request)
```

### **📈 Recommended API Security Architecture**

```
┌─────────────────┐    ┌─────────────────┐    ┌─────────────────┐
│   Frontend      │    │  Rate Limiter   │    │   Backend API   │
│                 │    │                 │    │                 │
│ • API Key       │◄──►│ • IP Limiting   │◄──►│ • Input Valid.  │
│ • CORS Headers  │    │ • User Quotas   │    │ • Auth Checks   │
│ • Request Valid.│    │ • DoS Protection│    │ • Audit Logs    │
└─────────────────┘    └─────────────────┘    └─────────────────┘
```

**Benefits:**

- **Layer 1**: Frontend validates requests before sending
- **Layer 2**: Rate limiter prevents abuse and DoS attacks
- **Layer 3**: Backend performs final validation and authorization

### **🔒 Security Implementation Checklist**

#### **Before Production Deployment:**

**CRITICAL (Must Do):**

- [ ] **Generate API Key**: Create strong API key for endpoint authentication
- [ ] **Enable Rate Limiting**: Implement per-IP request limits
- [ ] **Add Security Headers**: X-Frame-Options, CSP, X-Content-Type-Options
- [ ] **Input Validation**: Sanitize all user inputs (query, ranking)
- [ ] **CORS Configuration**: Restrict origins to known domains only
- [ ] **Environment Security**: Never commit API keys, use secure .env
- [ ] **HTTPS Only**: Force TLS in production (no HTTP)

**HIGH Priority:**

- [ ] **Audit Logging**: Log all API requests with IP and timestamp
- [ ] **Request Size Limits**: Prevent large payload attacks
- [ ] **Error Handling**: Don't expose stack traces or internal details
- [ ] **Session Security**: HTTP-only, secure, SameSite cookies
- [ ] **API Documentation**: Document authentication requirements

**MEDIUM Priority:**

- [ ] **User Authentication**: Consider adding user accounts for access control
- [ ] **Request Timeout**: Prevent long-running request abuse
- [ ] **Content Validation**: Verify response content types
- [ ] **Monitoring**: Set up alerts for unusual API usage patterns
- [ ] **Backup Strategy**: Secure backup of environment variables

#### **Security Testing Commands:**

**Test API Authentication:**

```bash
# Should fail without API key
curl "http://localhost:7860/fetch_results?query=test&ranking=hybrid"

# Should succeed with API key
curl -H "X-API-Key: your_api_key" "http://localhost:7860/fetch_results?query=test&ranking=hybrid"
```

**Test Rate Limiting:**

```bash
# Run multiple requests to trigger rate limit
for i in {1..15}; do
  curl -H "X-API-Key: your_api_key" "http://localhost:7860/fetch_results?query=test$i&ranking=hybrid"
  echo "Request $i"
done
```

**Test Input Validation:**

```bash
# Should reject invalid/malicious inputs
curl -H "X-API-Key: your_api_key" "http://localhost:7860/fetch_results?query=<script>alert('xss')</script>&ranking=invalid"
```

**Test Security Headers:**

```bash
# Check security headers in response
curl -I "http://localhost:7860/"
# Should see: X-Frame-Options, X-Content-Type-Options, etc.
```

#### **Security Monitoring:**

**Log Analysis Queries:**

```bash
# Monitor API usage patterns
grep "API_ACCESS" /var/log/colpali.log | tail -100

# Detect potential abuse
grep "RATE_LIMIT_EXCEEDED" /var/log/colpali.log

# Check authentication failures
grep "UNAUTHORIZED" /var/log/colpali.log
```

**Alerting Setup:**

- **Rate Limit Violations**: Alert when >50 requests/minute from single IP
- **Authentication Failures**: Alert on repeated unauthorized attempts
- **Unusual Queries**: Alert on suspicious query patterns or injection attempts
- **Resource Usage**: Alert on high CPU/memory usage (potential DoS)

## 🧪 Models Used

- **ColPali v1.2**: Visual document understanding
- **ColPaliGemma 3B**: Base visual-language model
- **Google Gemini 2.0**: AI chat and question answering

## 🔧 Configuration Options

### Environment Variables

| Variable                   | Required | Description                                 | Security Impact                     |
| -------------------------- | -------- | ------------------------------------------- | ----------------------------------- |
| `VESPA_APP_TOKEN_URL`      | Yes\*    | Vespa application URL (token auth)          | **HIGH** - Remote access            |
| `VESPA_CLOUD_SECRET_TOKEN` | Yes\*    | Vespa secret token                          | **CRITICAL** - Full database access |
| `USE_MTLS`                 | No       | Use mTLS instead of token auth              | **MEDIUM** - Auth method            |
| `VESPA_APP_MTLS_URL`       | Yes\*\*  | Vespa application URL (mTLS)                | **HIGH** - Remote access            |
| `VESPA_CLOUD_MTLS_KEY`     | Yes\*\*  | mTLS private key                            | **CRITICAL** - TLS credentials      |
| `VESPA_CLOUD_MTLS_CERT`    | Yes\*\*  | mTLS certificate                            | **HIGH** - TLS credentials          |
| `GEMINI_API_KEY`           | No       | Google Gemini API key                       | **HIGH** - AI access/costs          |
| `LOG_LEVEL`                | No       | Logging level (DEBUG, INFO, WARNING, ERROR) | **LOW** - Debug info                |
| `HOT_RELOAD`               | No       | Enable hot reload in development            | **LOW** - Dev convenience           |

#### **🔒 Security-Related Environment Variables (Recommended)**

| Variable                   | Required  | Description                          | Default |
| -------------------------- | --------- | ------------------------------------ | ------- |
| `COLPALI_API_KEY`          | **YES\*** | API key for endpoint authentication  | None    |
| `ALLOWED_ORIGINS`          | **YES\*** | Comma-separated allowed CORS origins | None    |
| `RATE_LIMIT_REQUESTS`      | No        | Max requests per minute per IP       | `10`    |
| `RATE_LIMIT_WINDOW`        | No        | Rate limit window in seconds         | `60`    |
| `MAX_QUERY_LENGTH`         | No        | Maximum query string length          | `500`   |
| `ENABLE_AUDIT_LOGGING`     | No        | Log all API requests for security    | `false` |
| `SECURITY_HEADERS_ENABLED` | No        | Enable security headers              | `true`  |
| `CSRF_SECRET`              | **YES\*** | Secret for CSRF token generation     | None    |

**Example Security-Enhanced `.env`:**

```bash
# Existing configuration
VESPA_APP_TOKEN_URL=https://your-app.vespa-cloud.com
VESPA_CLOUD_SECRET_TOKEN=your_vespa_secret_token
GEMINI_API_KEY=your_gemini_api_key

# NEW: Security configuration
COLPALI_API_KEY=your_strong_random_api_key_here
ALLOWED_ORIGINS=http://localhost:3000,https://yourdomain.com
RATE_LIMIT_REQUESTS=10
RATE_LIMIT_WINDOW=60
MAX_QUERY_LENGTH=500
ENABLE_AUDIT_LOGGING=true
SECURITY_HEADERS_ENABLED=true
CSRF_SECRET=your_random_csrf_secret_here

# Development vs Production
NODE_ENV=production  # Enable secure cookies
LOG_LEVEL=INFO       # Don't expose debug info in production
```

\*Required for token authentication  
\*\*Required for mTLS authentication  
\*\*\*Required for production security

## 🚨 Troubleshooting

### **LOCAL Processing Issues**

**ColPali model fails to load:**

```bash
# Check GPU memory
nvidia-smi  # For NVIDIA GPUs
# or
system_profiler SPDisplaysDataType  # For Apple Silicon

# Clear model cache if corrupted
rm -rf ~/.cache/huggingface/hub/models--vidore--colpali-v1.2
```

**Out of memory errors:**

- Reduce batch size in `feed_vespa.py` (try `batch_size=1`)
- Close other applications to free RAM/VRAM
- Use CPU processing if GPU memory insufficient: `CUDA_VISIBLE_DEVICES="" python main.py`

**Slow processing on CPU:**

- Expected behavior - ColPali requires significant computation
- Consider upgrading to GPU or Apple Silicon for 5-10x speedup
- Process documents overnight for large collections

### **REMOTE Processing Issues**

**Connection to Vespa fails:**

- Verify your Vespa URL and credentials in `.env`
- Check if the Vespa application is deployed and running
- Ensure network connectivity: `ping your-app.vespa-cloud.com`
- Validate authentication tokens haven't expired

**Document upload fails:**

- Check Vespa Cloud storage quota and billing
- Verify embedding format matches Vespa schema
- Ensure stable internet connection for large uploads

**Search returns no results:**

- Confirm documents were successfully uploaded to Vespa
- Check if embeddings were properly indexed
- Verify query processing isn't failing locally

### **MIXED (Local + Remote) Issues**

**Chat features don't work:**

- **LOCAL**: Verify document images are being generated locally
- **REMOTE**: Check `GEMINI_API_KEY` is set correctly
- **REMOTE**: Verify Gemini API quota and billing
- **NETWORK**: Ensure images can be sent to Gemini API

**Similarity maps missing:**

- **LOCAL**: Confirm ColPali model loaded successfully
- **LOCAL**: Check if similarity map generation completed
- **REMOTE**: Verify Vespa returned similarity data
- **BROWSER**: Clear browser cache for static files

### Performance Tips

**LOCAL Optimization:**

- Use GPU acceleration for 5-10x faster model inference
- Optimize batch sizes based on available memory
- Use SSD storage for faster model loading
- Consider quantized models for lower memory usage

**REMOTE Optimization:**

- Use Vespa's HNSW indexing for faster search
- Optimize embedding dimensions vs accuracy tradeoff
- Enable compression for faster network transfer
- Use multiple Vespa instances for high availability

**NETWORK Optimization:**

- Process documents in batches to reduce upload overhead
- Use compression for embedding transfer
- Consider regional Vespa deployment for lower latency

## 📄 License

Apache-2.0

## 🤝 Contributing

1. Fork the repository
2. Create a feature branch
3. Make your changes
4. Run tests and linting
5. Submit a pull request

## 📞 Support

For issues and questions:

- Check the troubleshooting section
- Review Vespa and ColPali documentation
- Open an issue on the repository