Spaces:
Build error
ColPali π€ Vespa - Visual Retrieval System
A powerful visual document retrieval system that combines ColPali (Contextual Late Interaction with Patch-level Information) with Vespa for scalable, intelligent document search and question-answering.
π Features
π Visual Document Search
- Multi-modal retrieval: Search through PDF documents using natural language queries
- Visual understanding: ColPali model processes document images and text simultaneously
- Token-level similarity maps: Visualize exactly which parts of documents match your query
- Multiple ranking algorithms: Choose between hybrid, semantic, and other ranking methods
π§ AI-Powered Chat
- Intelligent Q&A: Ask questions about retrieved documents using Google Gemini 2.0
- Context-aware responses: AI analyzes document images to provide accurate answers
- Real-time streaming: Get responses as they're generated
β‘ Scalable Infrastructure
- Vespa integration: Enterprise-grade search platform for large document collections
- Real-time processing: Instant search results and similarity map generation
- Cloud-ready: Supports Vespa Cloud deployment with secure authentication
ποΈ Architecture
βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ
β Frontend β β Backend β β Vespa Cloud β
β (Browser) β β (Your Local β β (Remote) β
β β β Computer) β β β
β β’ Search UI βββββΊβ β’ ColPali Model βββββΊβ β’ Document Storeβ
β β’ Similarity β β β’ Query Proc. β β β’ Vector Search β
β Maps β β β’ Sim Map Gen. β β β’ Ranking β
β β’ Chat Interfaceβ β β’ Gemini Int. β β β
βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ
β β β
Web Browser LOCAL AI REMOTE Storage
π LOCAL Processing (Your Computer)
All AI model inference happens on YOUR local machine:
- ColPali Model: Runs locally on your GPU/CPU (~7GB model)
- Document Processing: PDF β Images β Embeddings (local)
- Query Processing: Text β Embeddings (local)
- Similarity Maps: Visual attention generation (local)
- Gemini Chat: Processes retrieved images locally
Device Detection:
device = get_torch_device("auto") # Detects: CUDA, MPS (Apple), or CPU
print(f"Using device: {device}") # Shows YOUR hardware
βοΈ REMOTE Processing (Vespa Cloud)
Only storage and search index operations happen remotely:
- Document Storage: Stores processed embeddings (not raw models)
- Vector Search: Fast similarity search across document collection
- Query Routing: Handles search requests and ranking
- Metadata Storage: Document titles, URLs, page numbers
π Complete Data Flow
Document Upload Process:
- LOCAL: Your computer downloads PDF from URL
- LOCAL: ColPali converts PDF pages to images
- LOCAL: ColPali generates visual embeddings (1024 patches Γ 128 dims)
- LOCAL: Embeddings converted to binary format for efficiency
- REMOTE: Binary embeddings uploaded to Vespa Cloud
- REMOTE: Vespa indexes embeddings for fast search
Search Query Process:
- LOCAL: You enter search query in browser
- LOCAL: ColPali processes query β generates query embeddings
- REMOTE: Query embeddings sent to Vespa Cloud
- REMOTE: Vespa searches document index, returns matches
- LOCAL: ColPali generates similarity maps for results
- BROWSER: Results displayed with visual attention maps
AI Chat Process:
- LOCAL: Retrieved document images processed by your machine
- REMOTE: Images + query sent to Google Gemini API
- REMOTE: Gemini generates response based on visual content
- BROWSER: Streaming response displayed in real-time
Core Components
- ColPali Model: Visual-language model for document understanding (LOCAL)
- Vespa Search: Distributed search and storage engine (REMOTE)
- FastHTML Frontend: Modern, responsive web interface (BROWSER)
- Gemini Integration: AI-powered question answering (REMOTE API)
- Similarity Map Generator: Visual attention visualization (LOCAL)
π» System Requirements
LOCAL Machine Requirements (For AI Processing)
Minimum:
- CPU: Modern multi-core processor (Intel/AMD/Apple Silicon)
- RAM: 8GB+ (16GB recommended)
- Storage: 10GB free space (for model cache)
- Python: 3.10+ (< 3.13)
Recommended:
- GPU: NVIDIA GPU with 8GB+ VRAM (RTX 3070/4060 or better)
- Apple: M1/M2/M3 Mac (uses Metal Performance Shaders)
- RAM: 16GB+ for smoother processing
- Storage: SSD for faster model loading
Performance Examples:
- RTX 4090: ~1-2 seconds per query
- RTX 3070: ~3-5 seconds per query
- Apple M2: ~4-6 seconds per query
- CPU Only: ~15-30 seconds per query
REMOTE Requirements (Vespa Cloud)
What you need:
- Vespa Cloud account (handles all remote processing)
- Internet connection (for uploading embeddings and search queries)
- Authentication tokens (provided by Vespa Cloud)
What Vespa Cloud provides:
- Scalable storage for any number of documents
- Sub-second search across millions of embeddings
- High availability with automatic failover
- Global CDN for fast access worldwide
π° Cost Breakdown
FREE Components
- ColPali Model: Open source, runs locally (no per-query costs)
- Python Application: MIT/Apache licensed, completely free
- Local Processing: Uses your own hardware (no cloud AI fees)
PAID Components
- Vespa Cloud: Pay for storage and search operations
- ~$0.001 per 1000 searches
- ~$0.10 per GB storage per month
- Google Gemini API: Optional, for chat features only
- ~$0.01 per 1000 image tokens
- Only used when you ask questions about documents
Cost Examples (Monthly)
- Personal Use (100 documents, 1000 searches): ~$5-10/month
- Small Business (1000 documents, 10k searches): ~$20-50/month
- Enterprise (10k+ documents, 100k+ searches): $200+/month
π‘ Cost Optimization Tips:
- Use local Vespa installation to avoid cloud costs
- Disable Gemini chat if not needed (saves API costs)
- Process documents in batches to minimize upload time
π Quick Start
Prerequisites
- Python 3.10+ (< 3.13)
- 8GB+ RAM for ColPali model
- Vespa Cloud account or local Vespa installation
- Google Gemini API key (optional, for chat features)
- GPU recommended but not required
1. Installation
# Clone the repository
git clone <repository-url>
cd colpali-vespa-visual-retrieval
# Install dependencies
pip install -e .
# For development
pip install -e ".[dev]"
# For document feeding capabilities
pip install -e ".[feed]"
2. Environment Configuration
Create a .env
file with your configuration:
# Vespa Configuration
VESPA_APP_TOKEN_URL=https://your-app.vespa-cloud.com
VESPA_CLOUD_SECRET_TOKEN=your_secret_token
# Alternative: mTLS Authentication
USE_MTLS=false
VESPA_APP_MTLS_URL=https://your-app.vespa-cloud.com
VESPA_CLOUD_MTLS_KEY="-----BEGIN PRIVATE KEY-----..."
VESPA_CLOUD_MTLS_CERT="-----BEGIN CERTIFICATE-----..."
# Optional: Gemini AI (for chat features)
GEMINI_API_KEY=your_gemini_api_key
# Optional: Logging
LOG_LEVEL=INFO
HOT_RELOAD=false
3. Deploy Vespa Application
# Deploy the Vespa schema and configuration
python deploy_vespa_app.py \
--tenant_name your_tenant \
--vespa_application_name colpalidemo \
--token_id_write colpalidemo_write \
--token_id_read colpalidemo_read
4. Run the Application
python main.py
The application will be available at http://localhost:7860
π Document Management
Uploading Documents
Use the feeding script to process and upload PDF documents:
python feed_vespa.py \
--application_name colpalidemo \
--vespa_schema_name pdf_page
Document Processing Pipeline (LOCAL β REMOTE):
- PDF Download (LOCAL): Your computer downloads PDFs from URLs
- PDF Conversion (LOCAL): PDFs converted to images (one per page)
- ColPali Processing (LOCAL): Each page processed by ColPali model on YOUR GPU/CPU
- Embedding Generation (LOCAL): Visual embeddings created (1024 patches Γ 128 dimensions)
- Binary Encoding (LOCAL): Embeddings converted to efficient binary format
- Vespa Upload (REMOTE): Binary embeddings uploaded to Vespa Cloud
- Search Indexing (REMOTE): Vespa indexes embeddings for fast retrieval
β οΈ Important Notes:
- Processing Time: Expect 5-30 seconds per page depending on your hardware
- Network Usage: Only final embeddings uploaded (~1KB per page vs ~1MB original)
- Privacy: Original PDFs and images stay on your local machine
- Storage: Raw images cached locally for similarity map generation
Supported Operations
- β Upload Documents: Add new PDFs to the system
- β Search Documents: Query existing documents
- β View Documents: Browse stored documents
- β Remove Documents: Not currently implemented
- β Update Documents: Not currently implemented
π Authentication & Security
π‘οΈ Current Security Implementation
SECURE Components:
Vespa Authentication (REMOTE)
- Token Authentication: Bearer tokens for Vespa Cloud API access
- mTLS Certificates: Mutual TLS for enterprise security
- Encrypted Communication: HTTPS/TLS for all Vespa connections
API Key Management (LOCAL)
- Environment Variables: Sensitive keys stored in
.env
files - API Key Rotation: Google Gemini supports key rotation
- Local Storage: Keys never transmitted except to authorized APIs
LIMITED Security Components:
Session Management
# Basic UUID session tracking (FastHTML)
session["session_id"] = str(uuid.uuid4())
# HTTP-only cookies (Next.js)
cookieStore.set(SESSION_KEY, newSessionId, {
httpOnly: true,
secure: process.env.NODE_ENV === "production",
sameSite: "lax",
maxAge: 60 * 60 * 24 * 30, // 30 days
});
Basic Request Validation
# HTMX request validation
if "hx-request" not in request.headers:
return RedirectResponse("/search")
# Parameter validation
if not query:
return NextResponse.json({ error: "Query is required" }, { status: 400 });
β οΈ Security Limitations & Risks
MISSING Security Features:
β No API Authentication
- Local API endpoints are completely open
- No rate limiting or abuse protection
- No user authentication or authorization
- Anyone can access
/fetch_results
,/get_sim_map
endpoints
β No Input Sanitization
# Raw user input passed directly to models
query = searchParams.get("query") # No validation/sanitization
ranking = searchParams.get("ranking") # No input filtering
β No Security Headers
- No CORS configuration
- No Content Security Policy (CSP)
- No X-Frame-Options protection
- No X-Content-Type-Options validation
β No Rate Limiting
- Unlimited API requests
- No protection against DoS attacks
- No query throttling or user limits
β No CSRF Protection
- No token validation for state-changing operations
- Cross-site request forgery possible
π― Security Recommendations
IMMEDIATE (High Priority)
1. Add API Authentication
// middleware.ts - Add API key validation
export function middleware(request: NextRequest) {
const apiKey = request.headers.get("X-API-Key");
if (!apiKey || apiKey !== process.env.COLPALI_API_KEY) {
return new Response("Unauthorized", { status: 401 });
}
}
2. Implement Rate Limiting
// Use next-rate-limit or similar
import rateLimit from "@/lib/rate-limit";
const limiter = rateLimit({
interval: 60 * 1000, // 1 minute
uniqueTokenPerInterval: 500, // Limit each IP to 100 requests per interval
});
await limiter.check(10, getClientIP(request)); // 10 requests per minute
3. Add Security Headers
// next.config.js
const securityHeaders = [
{ key: "X-Frame-Options", value: "DENY" },
{ key: "X-Content-Type-Options", value: "nosniff" },
{ key: "Referrer-Policy", value: "strict-origin-when-cross-origin" },
{
key: "Content-Security-Policy",
value: "default-src 'self'; script-src 'self' 'unsafe-inline'",
},
];
4. Input Validation & Sanitization
import { z } from "zod";
const SearchSchema = z.object({
query: z
.string()
.min(1)
.max(500)
.regex(/^[a-zA-Z0-9\s\.\?\!]*$/),
ranking: z.enum(["hybrid", "colpali", "bm25"]),
});
MEDIUM Priority
5. CORS Configuration
// Restrict origins to known domains
const corsHeaders = {
"Access-Control-Allow-Origin": "https://yourdomain.com",
"Access-Control-Allow-Methods": "GET, POST, OPTIONS",
"Access-Control-Allow-Headers": "Content-Type, Authorization",
};
6. Request Size Limits
// Limit request payload sizes
export const config = {
api: {
bodyParser: {
sizeLimit: "1mb",
},
},
};
7. Audit Logging
# Log all API access with IP, timestamp, and queries
logger.info(f"API_ACCESS: {client_ip} - {endpoint} - {query[:100]}")
LONG-TERM (Production Ready)
8. User Authentication (Optional)
// Add NextAuth.js or similar for user accounts
// Implement role-based access control
// Add document ownership and permissions
9. Network Security
# Deploy behind reverse proxy (nginx/cloudflare)
# Enable DDoS protection
# Use Web Application Firewall (WAF)
10. Data Privacy Controls
// Implement data retention policies
// Add user data deletion capabilities
// GDPR compliance features
π Security Best Practices
For LOCAL Development:
- Never commit API keys to version control
- Use strong environment variable names (avoid
API_KEY
) - Rotate API keys regularly (monthly)
- Enable firewall on development machines
- Use HTTPS even locally for production testing
For PRODUCTION Deployment:
- Deploy behind CDN/WAF (Cloudflare, AWS Shield)
- Enable rate limiting at infrastructure level
- Use container security scanning
- Implement monitoring and alerting
- Regular security audits and penetration testing
For REMOTE Services:
- Vespa Cloud: Follows enterprise security standards
- Gemini API: Google-managed security and compliance
- Environment Isolation: Separate dev/staging/prod credentials
π¨ Current Risk Level: MEDIUM
Suitable for:
- β Personal projects and demos
- β Internal company tools (behind firewall)
- β Research and development environments
NOT suitable for:
- β Public internet deployment
- β Customer-facing applications
- β Production environments with sensitive data
- β Commercial applications without security hardening
π― Usage Guide
Basic Search
- Navigate to the homepage
- Enter your search query in natural language
- Select ranking method (hybrid, semantic, etc.)
- View results with similarity maps
Similarity Maps
- Click on token buttons to see which parts of documents match specific query terms
- Visual heatmaps show attention patterns
- Reset button returns to original document view
AI Chat
- Ask questions about retrieved documents
- Chat responses are based on document content
- Streaming responses for real-time interaction
Search Rankings
- Hybrid: Combines multiple ranking signals
- Semantic: Pure semantic similarity
- BM25: Traditional text-based ranking
- ColPali: Visual-first ranking
π οΈ Development
Project Structure
βββ main.py # Application entry point
βββ backend/
β βββ colpali.py # ColPali model integration
β βββ vespa_app.py # Vespa client and queries
β βββ modelmanager.py # Model management utilities
βββ frontend/
β βββ app.py # UI components
β βββ layout.py # Layout templates
βββ feed_vespa.py # Document upload script
βββ deploy_vespa_app.py # Vespa deployment script
βββ colpali-with-snippets/ # Vespa schema definitions
βββ static/ # Static assets and generated files
Running in Development
# Enable hot reload
export HOT_RELOAD=true
python main.py
# Or set in .env
echo "HOT_RELOAD=true" >> .env
Code Quality
# Format code
ruff format .
# Lint code
ruff check .
π API Endpoints
Current API Routes (β οΈ UNSECURED)
Endpoint | Method | Description | Security Status |
---|---|---|---|
/ |
GET | Homepage | β Public (safe) |
/search |
GET | Search interface | β Public (safe) |
/fetch_results |
GET | Fetch search results | β οΈ OPEN API |
/get_sim_map |
GET | Get similarity maps | β οΈ OPEN API |
/get-message |
GET | Chat with AI (SSE) | β οΈ OPEN API |
/full_image |
GET | Get full document image | β οΈ OPEN API |
/suggestions |
GET | Query autocomplete | β οΈ OPEN API |
/static/* |
GET | Static file serving | β Public (safe) |
Security Analysis by Endpoint
π SECURE Endpoints
/
and/search
: Static HTML pages, no sensitive data/static/*
: Public assets (CSS, JS, images)
β οΈ UNSECURED Endpoints (Risk)
/fetch_results
- HIGH RISK
# Anyone can perform unlimited searches
curl "http://localhost:7860/fetch_results?query=secret&ranking=hybrid"
- Risks: Resource abuse, server overload, competitive intelligence gathering
- Exposes: Search capabilities, document metadata, processing times
/get_sim_map
- MEDIUM RISK
# Access similarity maps without authentication
curl "http://localhost:7860/get_sim_map?query_id=123&idx=0&token=word&token_idx=5"
- Risks: Unauthorized access to visual analysis
- Exposes: Document visual patterns, query insights
/get-message
- HIGH RISK
# Trigger AI processing without limits
curl "http://localhost:7860/get-message?query_id=123&query=question&doc_ids=doc1,doc2"
- Risks: Gemini API abuse, cost exploitation, resource exhaustion
- Exposes: AI-generated insights, document content analysis
/full_image
- HIGH RISK
# Download any document image
curl "http://localhost:7860/full_image?doc_id=any_document_id"
- Risks: Unauthorized document access, data leakage
- Exposes: Full document images, potentially sensitive content
Immediate Security Fixes
1. Add API Key Authentication
# Python FastHTML middleware
@app.middleware("http")
async def verify_api_key(request, call_next):
if request.url.path.startswith("/fetch_results"):
api_key = request.headers.get("X-API-Key")
if not api_key or api_key != os.getenv("COLPALI_API_KEY"):
return JSONResponse({"error": "Unauthorized"}, status_code=401)
return await call_next(request)
2. Implement Rate Limiting
from slowapi import Limiter, _rate_limit_exceeded_handler
from slowapi.util import get_remote_address
limiter = Limiter(key_func=get_remote_address)
@rt("/fetch_results")
@limiter.limit("10/minute") # 10 requests per minute per IP
async def get_results(request, query: str, ranking: str):
# ... existing code
3. Input Validation
from pydantic import BaseModel, validator
class SearchRequest(BaseModel):
query: str
ranking: str
@validator('query')
def query_must_be_safe(cls, v):
if len(v) > 500:
raise ValueError('Query too long')
# Add sanitization logic
return v.strip()
4. Request Origin Validation
ALLOWED_ORIGINS = ["http://localhost:3000", "https://yourdomain.com"]
@app.middleware("http")
async def cors_middleware(request, call_next):
origin = request.headers.get("origin")
if origin not in ALLOWED_ORIGINS:
return JSONResponse({"error": "Forbidden"}, status_code=403)
return await call_next(request)
π Recommended API Security Architecture
βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ
β Frontend β β Rate Limiter β β Backend API β
β β β β β β
β β’ API Key βββββΊβ β’ IP Limiting βββββΊβ β’ Input Valid. β
β β’ CORS Headers β β β’ User Quotas β β β’ Auth Checks β
β β’ Request Valid.β β β’ DoS Protectionβ β β’ Audit Logs β
βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ
Benefits:
- Layer 1: Frontend validates requests before sending
- Layer 2: Rate limiter prevents abuse and DoS attacks
- Layer 3: Backend performs final validation and authorization
π Security Implementation Checklist
Before Production Deployment:
CRITICAL (Must Do):
- Generate API Key: Create strong API key for endpoint authentication
- Enable Rate Limiting: Implement per-IP request limits
- Add Security Headers: X-Frame-Options, CSP, X-Content-Type-Options
- Input Validation: Sanitize all user inputs (query, ranking)
- CORS Configuration: Restrict origins to known domains only
- Environment Security: Never commit API keys, use secure .env
- HTTPS Only: Force TLS in production (no HTTP)
HIGH Priority:
- Audit Logging: Log all API requests with IP and timestamp
- Request Size Limits: Prevent large payload attacks
- Error Handling: Don't expose stack traces or internal details
- Session Security: HTTP-only, secure, SameSite cookies
- API Documentation: Document authentication requirements
MEDIUM Priority:
- User Authentication: Consider adding user accounts for access control
- Request Timeout: Prevent long-running request abuse
- Content Validation: Verify response content types
- Monitoring: Set up alerts for unusual API usage patterns
- Backup Strategy: Secure backup of environment variables
Security Testing Commands:
Test API Authentication:
# Should fail without API key
curl "http://localhost:7860/fetch_results?query=test&ranking=hybrid"
# Should succeed with API key
curl -H "X-API-Key: your_api_key" "http://localhost:7860/fetch_results?query=test&ranking=hybrid"
Test Rate Limiting:
# Run multiple requests to trigger rate limit
for i in {1..15}; do
curl -H "X-API-Key: your_api_key" "http://localhost:7860/fetch_results?query=test$i&ranking=hybrid"
echo "Request $i"
done
Test Input Validation:
# Should reject invalid/malicious inputs
curl -H "X-API-Key: your_api_key" "http://localhost:7860/fetch_results?query=<script>alert('xss')</script>&ranking=invalid"
Test Security Headers:
# Check security headers in response
curl -I "http://localhost:7860/"
# Should see: X-Frame-Options, X-Content-Type-Options, etc.
Security Monitoring:
Log Analysis Queries:
# Monitor API usage patterns
grep "API_ACCESS" /var/log/colpali.log | tail -100
# Detect potential abuse
grep "RATE_LIMIT_EXCEEDED" /var/log/colpali.log
# Check authentication failures
grep "UNAUTHORIZED" /var/log/colpali.log
Alerting Setup:
- Rate Limit Violations: Alert when >50 requests/minute from single IP
- Authentication Failures: Alert on repeated unauthorized attempts
- Unusual Queries: Alert on suspicious query patterns or injection attempts
- Resource Usage: Alert on high CPU/memory usage (potential DoS)
π§ͺ Models Used
- ColPali v1.2: Visual document understanding
- ColPaliGemma 3B: Base visual-language model
- Google Gemini 2.0: AI chat and question answering
π§ Configuration Options
Environment Variables
Variable | Required | Description | Security Impact |
---|---|---|---|
VESPA_APP_TOKEN_URL |
Yes* | Vespa application URL (token auth) | HIGH - Remote access |
VESPA_CLOUD_SECRET_TOKEN |
Yes* | Vespa secret token | CRITICAL - Full database access |
USE_MTLS |
No | Use mTLS instead of token auth | MEDIUM - Auth method |
VESPA_APP_MTLS_URL |
Yes** | Vespa application URL (mTLS) | HIGH - Remote access |
VESPA_CLOUD_MTLS_KEY |
Yes** | mTLS private key | CRITICAL - TLS credentials |
VESPA_CLOUD_MTLS_CERT |
Yes** | mTLS certificate | HIGH - TLS credentials |
GEMINI_API_KEY |
No | Google Gemini API key | HIGH - AI access/costs |
LOG_LEVEL |
No | Logging level (DEBUG, INFO, WARNING, ERROR) | LOW - Debug info |
HOT_RELOAD |
No | Enable hot reload in development | LOW - Dev convenience |
π Security-Related Environment Variables (Recommended)
Variable | Required | Description | Default |
---|---|---|---|
COLPALI_API_KEY |
YES* | API key for endpoint authentication | None |
ALLOWED_ORIGINS |
YES* | Comma-separated allowed CORS origins | None |
RATE_LIMIT_REQUESTS |
No | Max requests per minute per IP | 10 |
RATE_LIMIT_WINDOW |
No | Rate limit window in seconds | 60 |
MAX_QUERY_LENGTH |
No | Maximum query string length | 500 |
ENABLE_AUDIT_LOGGING |
No | Log all API requests for security | false |
SECURITY_HEADERS_ENABLED |
No | Enable security headers | true |
CSRF_SECRET |
YES* | Secret for CSRF token generation | None |
Example Security-Enhanced .env
:
# Existing configuration
VESPA_APP_TOKEN_URL=https://your-app.vespa-cloud.com
VESPA_CLOUD_SECRET_TOKEN=your_vespa_secret_token
GEMINI_API_KEY=your_gemini_api_key
# NEW: Security configuration
COLPALI_API_KEY=your_strong_random_api_key_here
ALLOWED_ORIGINS=http://localhost:3000,https://yourdomain.com
RATE_LIMIT_REQUESTS=10
RATE_LIMIT_WINDOW=60
MAX_QUERY_LENGTH=500
ENABLE_AUDIT_LOGGING=true
SECURITY_HEADERS_ENABLED=true
CSRF_SECRET=your_random_csrf_secret_here
# Development vs Production
NODE_ENV=production # Enable secure cookies
LOG_LEVEL=INFO # Don't expose debug info in production
*Required for token authentication
**Required for mTLS authentication
***Required for production security
π¨ Troubleshooting
LOCAL Processing Issues
ColPali model fails to load:
# Check GPU memory
nvidia-smi # For NVIDIA GPUs
# or
system_profiler SPDisplaysDataType # For Apple Silicon
# Clear model cache if corrupted
rm -rf ~/.cache/huggingface/hub/models--vidore--colpali-v1.2
Out of memory errors:
- Reduce batch size in
feed_vespa.py
(trybatch_size=1
) - Close other applications to free RAM/VRAM
- Use CPU processing if GPU memory insufficient:
CUDA_VISIBLE_DEVICES="" python main.py
Slow processing on CPU:
- Expected behavior - ColPali requires significant computation
- Consider upgrading to GPU or Apple Silicon for 5-10x speedup
- Process documents overnight for large collections
REMOTE Processing Issues
Connection to Vespa fails:
- Verify your Vespa URL and credentials in
.env
- Check if the Vespa application is deployed and running
- Ensure network connectivity:
ping your-app.vespa-cloud.com
- Validate authentication tokens haven't expired
Document upload fails:
- Check Vespa Cloud storage quota and billing
- Verify embedding format matches Vespa schema
- Ensure stable internet connection for large uploads
Search returns no results:
- Confirm documents were successfully uploaded to Vespa
- Check if embeddings were properly indexed
- Verify query processing isn't failing locally
MIXED (Local + Remote) Issues
Chat features don't work:
- LOCAL: Verify document images are being generated locally
- REMOTE: Check
GEMINI_API_KEY
is set correctly - REMOTE: Verify Gemini API quota and billing
- NETWORK: Ensure images can be sent to Gemini API
Similarity maps missing:
- LOCAL: Confirm ColPali model loaded successfully
- LOCAL: Check if similarity map generation completed
- REMOTE: Verify Vespa returned similarity data
- BROWSER: Clear browser cache for static files
Performance Tips
LOCAL Optimization:
- Use GPU acceleration for 5-10x faster model inference
- Optimize batch sizes based on available memory
- Use SSD storage for faster model loading
- Consider quantized models for lower memory usage
REMOTE Optimization:
- Use Vespa's HNSW indexing for faster search
- Optimize embedding dimensions vs accuracy tradeoff
- Enable compression for faster network transfer
- Use multiple Vespa instances for high availability
NETWORK Optimization:
- Process documents in batches to reduce upload overhead
- Use compression for embedding transfer
- Consider regional Vespa deployment for lower latency
π License
Apache-2.0
π€ Contributing
- Fork the repository
- Create a feature branch
- Make your changes
- Run tests and linting
- Submit a pull request
π Support
For issues and questions:
- Check the troubleshooting section
- Review Vespa and ColPali documentation
- Open an issue on the repository