File size: 31,352 Bytes
a54266b
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
# ColPali 🀝 Vespa - Visual Retrieval System

A powerful visual document retrieval system that combines **ColPali** (Contextual Late Interaction with Patch-level Information) with **Vespa** for scalable, intelligent document search and question-answering.

## 🌟 Features

### πŸ” **Visual Document Search**

- **Multi-modal retrieval**: Search through PDF documents using natural language queries
- **Visual understanding**: ColPali model processes document images and text simultaneously
- **Token-level similarity maps**: Visualize exactly which parts of documents match your query
- **Multiple ranking algorithms**: Choose between hybrid, semantic, and other ranking methods

### 🧠 **AI-Powered Chat**

- **Intelligent Q&A**: Ask questions about retrieved documents using Google Gemini 2.0
- **Context-aware responses**: AI analyzes document images to provide accurate answers
- **Real-time streaming**: Get responses as they're generated

### ⚑ **Scalable Infrastructure**

- **Vespa integration**: Enterprise-grade search platform for large document collections
- **Real-time processing**: Instant search results and similarity map generation
- **Cloud-ready**: Supports Vespa Cloud deployment with secure authentication

## πŸ—οΈ Architecture

```
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚   Frontend      β”‚    β”‚    Backend      β”‚    β”‚   Vespa Cloud   β”‚
β”‚   (Browser)     β”‚    β”‚   (Your Local   β”‚    β”‚   (Remote)      β”‚
β”‚                 β”‚    β”‚    Computer)    β”‚    β”‚                 β”‚
β”‚ β€’ Search UI     │◄──►│ β€’ ColPali Model │◄──►│ β€’ Document Storeβ”‚
β”‚ β€’ Similarity    β”‚    β”‚ β€’ Query Proc.   β”‚    β”‚ β€’ Vector Search β”‚
β”‚   Maps          β”‚    β”‚ β€’ Sim Map Gen.  β”‚    β”‚ β€’ Ranking       β”‚
β”‚ β€’ Chat Interfaceβ”‚    β”‚ β€’ Gemini Int.   β”‚    β”‚                 β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
        ↑                        ↑                        ↑
   Web Browser              LOCAL AI               REMOTE Storage
```

### 🏠 **LOCAL Processing (Your Computer)**

**All AI model inference happens on YOUR local machine:**

- **ColPali Model**: Runs locally on your GPU/CPU (~7GB model)
- **Document Processing**: PDF β†’ Images β†’ Embeddings (local)
- **Query Processing**: Text β†’ Embeddings (local)
- **Similarity Maps**: Visual attention generation (local)
- **Gemini Chat**: Processes retrieved images locally

**Device Detection:**

```python
device = get_torch_device("auto")  # Detects: CUDA, MPS (Apple), or CPU
print(f"Using device: {device}")   # Shows YOUR hardware
```

### ☁️ **REMOTE Processing (Vespa Cloud)**

**Only storage and search index operations happen remotely:**

- **Document Storage**: Stores processed embeddings (not raw models)
- **Vector Search**: Fast similarity search across document collection
- **Query Routing**: Handles search requests and ranking
- **Metadata Storage**: Document titles, URLs, page numbers

### πŸ”„ **Complete Data Flow**

#### **Document Upload Process:**

1. **LOCAL**: Your computer downloads PDF from URL
2. **LOCAL**: ColPali converts PDF pages to images
3. **LOCAL**: ColPali generates visual embeddings (1024 patches Γ— 128 dims)
4. **LOCAL**: Embeddings converted to binary format for efficiency
5. **REMOTE**: Binary embeddings uploaded to Vespa Cloud
6. **REMOTE**: Vespa indexes embeddings for fast search

#### **Search Query Process:**

1. **LOCAL**: You enter search query in browser
2. **LOCAL**: ColPali processes query β†’ generates query embeddings
3. **REMOTE**: Query embeddings sent to Vespa Cloud
4. **REMOTE**: Vespa searches document index, returns matches
5. **LOCAL**: ColPali generates similarity maps for results
6. **BROWSER**: Results displayed with visual attention maps

#### **AI Chat Process:**

1. **LOCAL**: Retrieved document images processed by your machine
2. **REMOTE**: Images + query sent to Google Gemini API
3. **REMOTE**: Gemini generates response based on visual content
4. **BROWSER**: Streaming response displayed in real-time

### Core Components

- **ColPali Model**: Visual-language model for document understanding (LOCAL)
- **Vespa Search**: Distributed search and storage engine (REMOTE)
- **FastHTML Frontend**: Modern, responsive web interface (BROWSER)
- **Gemini Integration**: AI-powered question answering (REMOTE API)
- **Similarity Map Generator**: Visual attention visualization (LOCAL)

## πŸ’» **System Requirements**

### **LOCAL Machine Requirements (For AI Processing)**

**Minimum:**

- **CPU**: Modern multi-core processor (Intel/AMD/Apple Silicon)
- **RAM**: 8GB+ (16GB recommended)
- **Storage**: 10GB free space (for model cache)
- **Python**: 3.10+ (< 3.13)

**Recommended:**

- **GPU**: NVIDIA GPU with 8GB+ VRAM (RTX 3070/4060 or better)
- **Apple**: M1/M2/M3 Mac (uses Metal Performance Shaders)
- **RAM**: 16GB+ for smoother processing
- **Storage**: SSD for faster model loading

**Performance Examples:**

- **RTX 4090**: ~1-2 seconds per query
- **RTX 3070**: ~3-5 seconds per query
- **Apple M2**: ~4-6 seconds per query
- **CPU Only**: ~15-30 seconds per query

### **REMOTE Requirements (Vespa Cloud)**

**What you need:**

- **Vespa Cloud account** (handles all remote processing)
- **Internet connection** (for uploading embeddings and search queries)
- **Authentication tokens** (provided by Vespa Cloud)

**What Vespa Cloud provides:**

- **Scalable storage** for any number of documents
- **Sub-second search** across millions of embeddings
- **High availability** with automatic failover
- **Global CDN** for fast access worldwide

## πŸ’° **Cost Breakdown**

### **FREE Components**

- **ColPali Model**: Open source, runs locally (no per-query costs)
- **Python Application**: MIT/Apache licensed, completely free
- **Local Processing**: Uses your own hardware (no cloud AI fees)

### **PAID Components**

- **Vespa Cloud**: Pay for storage and search operations
  - ~$0.001 per 1000 searches
  - ~$0.10 per GB storage per month
- **Google Gemini API**: Optional, for chat features only
  - ~$0.01 per 1000 image tokens
  - Only used when you ask questions about documents

### **Cost Examples (Monthly)**

- **Personal Use** (100 documents, 1000 searches): ~$5-10/month
- **Small Business** (1000 documents, 10k searches): ~$20-50/month
- **Enterprise** (10k+ documents, 100k+ searches): $200+/month

**πŸ’‘ Cost Optimization Tips:**

- Use local Vespa installation to avoid cloud costs
- Disable Gemini chat if not needed (saves API costs)
- Process documents in batches to minimize upload time

## πŸš€ Quick Start

### Prerequisites

- Python 3.10+ (< 3.13)
- **8GB+ RAM** for ColPali model
- **Vespa Cloud account** or local Vespa installation
- **Google Gemini API key** (optional, for chat features)
- **GPU recommended** but not required

### 1. Installation

```bash
# Clone the repository
git clone <repository-url>
cd colpali-vespa-visual-retrieval

# Install dependencies
pip install -e .

# For development
pip install -e ".[dev]"

# For document feeding capabilities
pip install -e ".[feed]"
```

### 2. Environment Configuration

Create a `.env` file with your configuration:

```bash
# Vespa Configuration
VESPA_APP_TOKEN_URL=https://your-app.vespa-cloud.com
VESPA_CLOUD_SECRET_TOKEN=your_secret_token

# Alternative: mTLS Authentication
USE_MTLS=false
VESPA_APP_MTLS_URL=https://your-app.vespa-cloud.com
VESPA_CLOUD_MTLS_KEY="-----BEGIN PRIVATE KEY-----..."
VESPA_CLOUD_MTLS_CERT="-----BEGIN CERTIFICATE-----..."

# Optional: Gemini AI (for chat features)
GEMINI_API_KEY=your_gemini_api_key

# Optional: Logging
LOG_LEVEL=INFO
HOT_RELOAD=false
```

### 3. Deploy Vespa Application

```bash
# Deploy the Vespa schema and configuration
python deploy_vespa_app.py \
  --tenant_name your_tenant \
  --vespa_application_name colpalidemo \
  --token_id_write colpalidemo_write \
  --token_id_read colpalidemo_read
```

### 4. Run the Application

```bash
python main.py
```

The application will be available at `http://localhost:7860`

## πŸ“š Document Management

### Uploading Documents

Use the feeding script to process and upload PDF documents:

```bash
python feed_vespa.py \
  --application_name colpalidemo \
  --vespa_schema_name pdf_page
```

**Document Processing Pipeline (LOCAL β†’ REMOTE):**

1. **PDF Download** (LOCAL): Your computer downloads PDFs from URLs
2. **PDF Conversion** (LOCAL): PDFs converted to images (one per page)
3. **ColPali Processing** (LOCAL): Each page processed by ColPali model on YOUR GPU/CPU
4. **Embedding Generation** (LOCAL): Visual embeddings created (1024 patches Γ— 128 dimensions)
5. **Binary Encoding** (LOCAL): Embeddings converted to efficient binary format
6. **Vespa Upload** (REMOTE): Binary embeddings uploaded to Vespa Cloud
7. **Search Indexing** (REMOTE): Vespa indexes embeddings for fast retrieval

**⚠️ Important Notes:**

- **Processing Time**: Expect 5-30 seconds per page depending on your hardware
- **Network Usage**: Only final embeddings uploaded (~1KB per page vs ~1MB original)
- **Privacy**: Original PDFs and images stay on your local machine
- **Storage**: Raw images cached locally for similarity map generation

### Supported Operations

- βœ… **Upload Documents**: Add new PDFs to the system
- βœ… **Search Documents**: Query existing documents
- βœ… **View Documents**: Browse stored documents
- ❌ **Remove Documents**: _Not currently implemented_
- ❌ **Update Documents**: _Not currently implemented_

## πŸ” Authentication & Security

### πŸ›‘οΈ **Current Security Implementation**

#### **SECURE Components:**

**Vespa Authentication (REMOTE)**

- **Token Authentication**: Bearer tokens for Vespa Cloud API access
- **mTLS Certificates**: Mutual TLS for enterprise security
- **Encrypted Communication**: HTTPS/TLS for all Vespa connections

**API Key Management (LOCAL)**

- **Environment Variables**: Sensitive keys stored in `.env` files
- **API Key Rotation**: Google Gemini supports key rotation
- **Local Storage**: Keys never transmitted except to authorized APIs

#### **LIMITED Security Components:**

**Session Management**

```python
# Basic UUID session tracking (FastHTML)
session["session_id"] = str(uuid.uuid4())

# HTTP-only cookies (Next.js)
cookieStore.set(SESSION_KEY, newSessionId, {
  httpOnly: true,
  secure: process.env.NODE_ENV === "production",
  sameSite: "lax",
  maxAge: 60 * 60 * 24 * 30, // 30 days
});
```

**Basic Request Validation**

```python
# HTMX request validation
if "hx-request" not in request.headers:
    return RedirectResponse("/search")

# Parameter validation
if not query:
    return NextResponse.json({ error: "Query is required" }, { status: 400 });
```

### ⚠️ **Security Limitations & Risks**

#### **MISSING Security Features:**

**❌ No API Authentication**

- Local API endpoints are **completely open**
- No rate limiting or abuse protection
- No user authentication or authorization
- Anyone can access `/fetch_results`, `/get_sim_map` endpoints

**❌ No Input Sanitization**

```python
# Raw user input passed directly to models
query = searchParams.get("query")  # No validation/sanitization
ranking = searchParams.get("ranking")  # No input filtering
```

**❌ No Security Headers**

- No CORS configuration
- No Content Security Policy (CSP)
- No X-Frame-Options protection
- No X-Content-Type-Options validation

**❌ No Rate Limiting**

- Unlimited API requests
- No protection against DoS attacks
- No query throttling or user limits

**❌ No CSRF Protection**

- No token validation for state-changing operations
- Cross-site request forgery possible

### 🎯 **Security Recommendations**

#### **IMMEDIATE (High Priority)**

**1. Add API Authentication**

```typescript
// middleware.ts - Add API key validation
export function middleware(request: NextRequest) {
  const apiKey = request.headers.get("X-API-Key");
  if (!apiKey || apiKey !== process.env.COLPALI_API_KEY) {
    return new Response("Unauthorized", { status: 401 });
  }
}
```

**2. Implement Rate Limiting**

```typescript
// Use next-rate-limit or similar
import rateLimit from "@/lib/rate-limit";

const limiter = rateLimit({
  interval: 60 * 1000, // 1 minute
  uniqueTokenPerInterval: 500, // Limit each IP to 100 requests per interval
});

await limiter.check(10, getClientIP(request)); // 10 requests per minute
```

**3. Add Security Headers**

```typescript
// next.config.js
const securityHeaders = [
  { key: "X-Frame-Options", value: "DENY" },
  { key: "X-Content-Type-Options", value: "nosniff" },
  { key: "Referrer-Policy", value: "strict-origin-when-cross-origin" },
  {
    key: "Content-Security-Policy",
    value: "default-src 'self'; script-src 'self' 'unsafe-inline'",
  },
];
```

**4. Input Validation & Sanitization**

```typescript
import { z } from "zod";

const SearchSchema = z.object({
  query: z
    .string()
    .min(1)
    .max(500)
    .regex(/^[a-zA-Z0-9\s\.\?\!]*$/),
  ranking: z.enum(["hybrid", "colpali", "bm25"]),
});
```

#### **MEDIUM Priority**

**5. CORS Configuration**

```typescript
// Restrict origins to known domains
const corsHeaders = {
  "Access-Control-Allow-Origin": "https://yourdomain.com",
  "Access-Control-Allow-Methods": "GET, POST, OPTIONS",
  "Access-Control-Allow-Headers": "Content-Type, Authorization",
};
```

**6. Request Size Limits**

```typescript
// Limit request payload sizes
export const config = {
  api: {
    bodyParser: {
      sizeLimit: "1mb",
    },
  },
};
```

**7. Audit Logging**

```python
# Log all API access with IP, timestamp, and queries
logger.info(f"API_ACCESS: {client_ip} - {endpoint} - {query[:100]}")
```

#### **LONG-TERM (Production Ready)**

**8. User Authentication (Optional)**

```typescript
// Add NextAuth.js or similar for user accounts
// Implement role-based access control
// Add document ownership and permissions
```

**9. Network Security**

```bash
# Deploy behind reverse proxy (nginx/cloudflare)
# Enable DDoS protection
# Use Web Application Firewall (WAF)
```

**10. Data Privacy Controls**

```typescript
// Implement data retention policies
// Add user data deletion capabilities
// GDPR compliance features
```

### πŸ”’ **Security Best Practices**

#### **For LOCAL Development:**

- **Never commit API keys** to version control
- **Use strong environment variable names** (avoid `API_KEY`)
- **Rotate API keys regularly** (monthly)
- **Enable firewall** on development machines
- **Use HTTPS even locally** for production testing

#### **For PRODUCTION Deployment:**

- **Deploy behind CDN/WAF** (Cloudflare, AWS Shield)
- **Enable rate limiting** at infrastructure level
- **Use container security scanning**
- **Implement monitoring and alerting**
- **Regular security audits and penetration testing**

#### **For REMOTE Services:**

- **Vespa Cloud**: Follows enterprise security standards
- **Gemini API**: Google-managed security and compliance
- **Environment Isolation**: Separate dev/staging/prod credentials

### 🚨 **Current Risk Level: MEDIUM**

**Suitable for:**

- βœ… **Personal projects and demos**
- βœ… **Internal company tools** (behind firewall)
- βœ… **Research and development** environments

**NOT suitable for:**

- ❌ **Public internet deployment**
- ❌ **Customer-facing applications**
- ❌ **Production environments** with sensitive data
- ❌ **Commercial applications** without security hardening

## 🎯 Usage Guide

### Basic Search

1. Navigate to the homepage
2. Enter your search query in natural language
3. Select ranking method (hybrid, semantic, etc.)
4. View results with similarity maps

### Similarity Maps

- Click on token buttons to see which parts of documents match specific query terms
- Visual heatmaps show attention patterns
- Reset button returns to original document view

### AI Chat

- Ask questions about retrieved documents
- Chat responses are based on document content
- Streaming responses for real-time interaction

### Search Rankings

- **Hybrid**: Combines multiple ranking signals
- **Semantic**: Pure semantic similarity
- **BM25**: Traditional text-based ranking
- **ColPali**: Visual-first ranking

## πŸ› οΈ Development

### Project Structure

```
β”œβ”€β”€ main.py                 # Application entry point
β”œβ”€β”€ backend/
β”‚   β”œβ”€β”€ colpali.py         # ColPali model integration
β”‚   β”œβ”€β”€ vespa_app.py       # Vespa client and queries
β”‚   └── modelmanager.py    # Model management utilities
β”œβ”€β”€ frontend/
β”‚   β”œβ”€β”€ app.py             # UI components
β”‚   └── layout.py          # Layout templates
β”œβ”€β”€ feed_vespa.py          # Document upload script
β”œβ”€β”€ deploy_vespa_app.py    # Vespa deployment script
β”œβ”€β”€ colpali-with-snippets/ # Vespa schema definitions
└── static/                # Static assets and generated files
```

### Running in Development

```bash
# Enable hot reload
export HOT_RELOAD=true
python main.py

# Or set in .env
echo "HOT_RELOAD=true" >> .env
```

### Code Quality

```bash
# Format code
ruff format .

# Lint code
ruff check .
```

## πŸ“Š API Endpoints

### **Current API Routes (⚠️ UNSECURED)**

| Endpoint         | Method | Description             | Security Status  |
| ---------------- | ------ | ----------------------- | ---------------- |
| `/`              | GET    | Homepage                | βœ… Public (safe) |
| `/search`        | GET    | Search interface        | βœ… Public (safe) |
| `/fetch_results` | GET    | Fetch search results    | ⚠️ **OPEN API**  |
| `/get_sim_map`   | GET    | Get similarity maps     | ⚠️ **OPEN API**  |
| `/get-message`   | GET    | Chat with AI (SSE)      | ⚠️ **OPEN API**  |
| `/full_image`    | GET    | Get full document image | ⚠️ **OPEN API**  |
| `/suggestions`   | GET    | Query autocomplete      | ⚠️ **OPEN API**  |
| `/static/*`      | GET    | Static file serving     | βœ… Public (safe) |

### **Security Analysis by Endpoint**

#### **πŸ”’ SECURE Endpoints**

- **`/`** and **`/search`**: Static HTML pages, no sensitive data
- **`/static/*`**: Public assets (CSS, JS, images)

#### **⚠️ UNSECURED Endpoints (Risk)**

**`/fetch_results`** - **HIGH RISK**

```bash
# Anyone can perform unlimited searches
curl "http://localhost:7860/fetch_results?query=secret&ranking=hybrid"
```

- **Risks**: Resource abuse, server overload, competitive intelligence gathering
- **Exposes**: Search capabilities, document metadata, processing times

**`/get_sim_map`** - **MEDIUM RISK**

```bash
# Access similarity maps without authentication
curl "http://localhost:7860/get_sim_map?query_id=123&idx=0&token=word&token_idx=5"
```

- **Risks**: Unauthorized access to visual analysis
- **Exposes**: Document visual patterns, query insights

**`/get-message`** - **HIGH RISK**

```bash
# Trigger AI processing without limits
curl "http://localhost:7860/get-message?query_id=123&query=question&doc_ids=doc1,doc2"
```

- **Risks**: Gemini API abuse, cost exploitation, resource exhaustion
- **Exposes**: AI-generated insights, document content analysis

**`/full_image`** - **HIGH RISK**

```bash
# Download any document image
curl "http://localhost:7860/full_image?doc_id=any_document_id"
```

- **Risks**: Unauthorized document access, data leakage
- **Exposes**: Full document images, potentially sensitive content

### **Immediate Security Fixes**

#### **1. Add API Key Authentication**

```python
# Python FastHTML middleware
@app.middleware("http")
async def verify_api_key(request, call_next):
    if request.url.path.startswith("/fetch_results"):
        api_key = request.headers.get("X-API-Key")
        if not api_key or api_key != os.getenv("COLPALI_API_KEY"):
            return JSONResponse({"error": "Unauthorized"}, status_code=401)
    return await call_next(request)
```

#### **2. Implement Rate Limiting**

```python
from slowapi import Limiter, _rate_limit_exceeded_handler
from slowapi.util import get_remote_address

limiter = Limiter(key_func=get_remote_address)

@rt("/fetch_results")
@limiter.limit("10/minute")  # 10 requests per minute per IP
async def get_results(request, query: str, ranking: str):
    # ... existing code
```

#### **3. Input Validation**

```python
from pydantic import BaseModel, validator

class SearchRequest(BaseModel):
    query: str
    ranking: str

    @validator('query')
    def query_must_be_safe(cls, v):
        if len(v) > 500:
            raise ValueError('Query too long')
        # Add sanitization logic
        return v.strip()
```

#### **4. Request Origin Validation**

```python
ALLOWED_ORIGINS = ["http://localhost:3000", "https://yourdomain.com"]

@app.middleware("http")
async def cors_middleware(request, call_next):
    origin = request.headers.get("origin")
    if origin not in ALLOWED_ORIGINS:
        return JSONResponse({"error": "Forbidden"}, status_code=403)
    return await call_next(request)
```

### **πŸ“ˆ Recommended API Security Architecture**

```
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚   Frontend      β”‚    β”‚  Rate Limiter   β”‚    β”‚   Backend API   β”‚
β”‚                 β”‚    β”‚                 β”‚    β”‚                 β”‚
β”‚ β€’ API Key       │◄──►│ β€’ IP Limiting   │◄──►│ β€’ Input Valid.  β”‚
β”‚ β€’ CORS Headers  β”‚    β”‚ β€’ User Quotas   β”‚    β”‚ β€’ Auth Checks   β”‚
β”‚ β€’ Request Valid.β”‚    β”‚ β€’ DoS Protectionβ”‚    β”‚ β€’ Audit Logs    β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
```

**Benefits:**

- **Layer 1**: Frontend validates requests before sending
- **Layer 2**: Rate limiter prevents abuse and DoS attacks
- **Layer 3**: Backend performs final validation and authorization

### **πŸ”’ Security Implementation Checklist**

#### **Before Production Deployment:**

**CRITICAL (Must Do):**

- [ ] **Generate API Key**: Create strong API key for endpoint authentication
- [ ] **Enable Rate Limiting**: Implement per-IP request limits
- [ ] **Add Security Headers**: X-Frame-Options, CSP, X-Content-Type-Options
- [ ] **Input Validation**: Sanitize all user inputs (query, ranking)
- [ ] **CORS Configuration**: Restrict origins to known domains only
- [ ] **Environment Security**: Never commit API keys, use secure .env
- [ ] **HTTPS Only**: Force TLS in production (no HTTP)

**HIGH Priority:**

- [ ] **Audit Logging**: Log all API requests with IP and timestamp
- [ ] **Request Size Limits**: Prevent large payload attacks
- [ ] **Error Handling**: Don't expose stack traces or internal details
- [ ] **Session Security**: HTTP-only, secure, SameSite cookies
- [ ] **API Documentation**: Document authentication requirements

**MEDIUM Priority:**

- [ ] **User Authentication**: Consider adding user accounts for access control
- [ ] **Request Timeout**: Prevent long-running request abuse
- [ ] **Content Validation**: Verify response content types
- [ ] **Monitoring**: Set up alerts for unusual API usage patterns
- [ ] **Backup Strategy**: Secure backup of environment variables

#### **Security Testing Commands:**

**Test API Authentication:**

```bash
# Should fail without API key
curl "http://localhost:7860/fetch_results?query=test&ranking=hybrid"

# Should succeed with API key
curl -H "X-API-Key: your_api_key" "http://localhost:7860/fetch_results?query=test&ranking=hybrid"
```

**Test Rate Limiting:**

```bash
# Run multiple requests to trigger rate limit
for i in {1..15}; do
  curl -H "X-API-Key: your_api_key" "http://localhost:7860/fetch_results?query=test$i&ranking=hybrid"
  echo "Request $i"
done
```

**Test Input Validation:**

```bash
# Should reject invalid/malicious inputs
curl -H "X-API-Key: your_api_key" "http://localhost:7860/fetch_results?query=<script>alert('xss')</script>&ranking=invalid"
```

**Test Security Headers:**

```bash
# Check security headers in response
curl -I "http://localhost:7860/"
# Should see: X-Frame-Options, X-Content-Type-Options, etc.
```

#### **Security Monitoring:**

**Log Analysis Queries:**

```bash
# Monitor API usage patterns
grep "API_ACCESS" /var/log/colpali.log | tail -100

# Detect potential abuse
grep "RATE_LIMIT_EXCEEDED" /var/log/colpali.log

# Check authentication failures
grep "UNAUTHORIZED" /var/log/colpali.log
```

**Alerting Setup:**

- **Rate Limit Violations**: Alert when >50 requests/minute from single IP
- **Authentication Failures**: Alert on repeated unauthorized attempts
- **Unusual Queries**: Alert on suspicious query patterns or injection attempts
- **Resource Usage**: Alert on high CPU/memory usage (potential DoS)

## πŸ§ͺ Models Used

- **ColPali v1.2**: Visual document understanding
- **ColPaliGemma 3B**: Base visual-language model
- **Google Gemini 2.0**: AI chat and question answering

## πŸ”§ Configuration Options

### Environment Variables

| Variable                   | Required | Description                                 | Security Impact                     |
| -------------------------- | -------- | ------------------------------------------- | ----------------------------------- |
| `VESPA_APP_TOKEN_URL`      | Yes\*    | Vespa application URL (token auth)          | **HIGH** - Remote access            |
| `VESPA_CLOUD_SECRET_TOKEN` | Yes\*    | Vespa secret token                          | **CRITICAL** - Full database access |
| `USE_MTLS`                 | No       | Use mTLS instead of token auth              | **MEDIUM** - Auth method            |
| `VESPA_APP_MTLS_URL`       | Yes\*\*  | Vespa application URL (mTLS)                | **HIGH** - Remote access            |
| `VESPA_CLOUD_MTLS_KEY`     | Yes\*\*  | mTLS private key                            | **CRITICAL** - TLS credentials      |
| `VESPA_CLOUD_MTLS_CERT`    | Yes\*\*  | mTLS certificate                            | **HIGH** - TLS credentials          |
| `GEMINI_API_KEY`           | No       | Google Gemini API key                       | **HIGH** - AI access/costs          |
| `LOG_LEVEL`                | No       | Logging level (DEBUG, INFO, WARNING, ERROR) | **LOW** - Debug info                |
| `HOT_RELOAD`               | No       | Enable hot reload in development            | **LOW** - Dev convenience           |

#### **πŸ”’ Security-Related Environment Variables (Recommended)**

| Variable                   | Required  | Description                          | Default |
| -------------------------- | --------- | ------------------------------------ | ------- |
| `COLPALI_API_KEY`          | **YES\*** | API key for endpoint authentication  | None    |
| `ALLOWED_ORIGINS`          | **YES\*** | Comma-separated allowed CORS origins | None    |
| `RATE_LIMIT_REQUESTS`      | No        | Max requests per minute per IP       | `10`    |
| `RATE_LIMIT_WINDOW`        | No        | Rate limit window in seconds         | `60`    |
| `MAX_QUERY_LENGTH`         | No        | Maximum query string length          | `500`   |
| `ENABLE_AUDIT_LOGGING`     | No        | Log all API requests for security    | `false` |
| `SECURITY_HEADERS_ENABLED` | No        | Enable security headers              | `true`  |
| `CSRF_SECRET`              | **YES\*** | Secret for CSRF token generation     | None    |

**Example Security-Enhanced `.env`:**

```bash
# Existing configuration
VESPA_APP_TOKEN_URL=https://your-app.vespa-cloud.com
VESPA_CLOUD_SECRET_TOKEN=your_vespa_secret_token
GEMINI_API_KEY=your_gemini_api_key

# NEW: Security configuration
COLPALI_API_KEY=your_strong_random_api_key_here
ALLOWED_ORIGINS=http://localhost:3000,https://yourdomain.com
RATE_LIMIT_REQUESTS=10
RATE_LIMIT_WINDOW=60
MAX_QUERY_LENGTH=500
ENABLE_AUDIT_LOGGING=true
SECURITY_HEADERS_ENABLED=true
CSRF_SECRET=your_random_csrf_secret_here

# Development vs Production
NODE_ENV=production  # Enable secure cookies
LOG_LEVEL=INFO       # Don't expose debug info in production
```

\*Required for token authentication  
\*\*Required for mTLS authentication  
\*\*\*Required for production security

## 🚨 Troubleshooting

### **LOCAL Processing Issues**

**ColPali model fails to load:**

```bash
# Check GPU memory
nvidia-smi  # For NVIDIA GPUs
# or
system_profiler SPDisplaysDataType  # For Apple Silicon

# Clear model cache if corrupted
rm -rf ~/.cache/huggingface/hub/models--vidore--colpali-v1.2
```

**Out of memory errors:**

- Reduce batch size in `feed_vespa.py` (try `batch_size=1`)
- Close other applications to free RAM/VRAM
- Use CPU processing if GPU memory insufficient: `CUDA_VISIBLE_DEVICES="" python main.py`

**Slow processing on CPU:**

- Expected behavior - ColPali requires significant computation
- Consider upgrading to GPU or Apple Silicon for 5-10x speedup
- Process documents overnight for large collections

### **REMOTE Processing Issues**

**Connection to Vespa fails:**

- Verify your Vespa URL and credentials in `.env`
- Check if the Vespa application is deployed and running
- Ensure network connectivity: `ping your-app.vespa-cloud.com`
- Validate authentication tokens haven't expired

**Document upload fails:**

- Check Vespa Cloud storage quota and billing
- Verify embedding format matches Vespa schema
- Ensure stable internet connection for large uploads

**Search returns no results:**

- Confirm documents were successfully uploaded to Vespa
- Check if embeddings were properly indexed
- Verify query processing isn't failing locally

### **MIXED (Local + Remote) Issues**

**Chat features don't work:**

- **LOCAL**: Verify document images are being generated locally
- **REMOTE**: Check `GEMINI_API_KEY` is set correctly
- **REMOTE**: Verify Gemini API quota and billing
- **NETWORK**: Ensure images can be sent to Gemini API

**Similarity maps missing:**

- **LOCAL**: Confirm ColPali model loaded successfully
- **LOCAL**: Check if similarity map generation completed
- **REMOTE**: Verify Vespa returned similarity data
- **BROWSER**: Clear browser cache for static files

### Performance Tips

**LOCAL Optimization:**

- Use GPU acceleration for 5-10x faster model inference
- Optimize batch sizes based on available memory
- Use SSD storage for faster model loading
- Consider quantized models for lower memory usage

**REMOTE Optimization:**

- Use Vespa's HNSW indexing for faster search
- Optimize embedding dimensions vs accuracy tradeoff
- Enable compression for faster network transfer
- Use multiple Vespa instances for high availability

**NETWORK Optimization:**

- Process documents in batches to reduce upload overhead
- Use compression for embedding transfer
- Consider regional Vespa deployment for lower latency

## πŸ“„ License

Apache-2.0

## 🀝 Contributing

1. Fork the repository
2. Create a feature branch
3. Make your changes
4. Run tests and linting
5. Submit a pull request

## πŸ“ž Support

For issues and questions:

- Check the troubleshooting section
- Review Vespa and ColPali documentation
- Open an issue on the repository