Spaces:
Running
Running
Commit
Β·
34cedd8
1
Parent(s):
1e32a60
Add support for reasoning trace display from NuMarkdown-8B-Thinking model
Browse files- Created ReasoningParser module to detect and parse <think>/<answer> tags
- Added collapsible reasoning panel UI with formatted step display
- Automatically separates reasoning from final output for cleaner view
- Shows reasoning statistics (word count, percentage of output)
- Added india-medical-ocr-test dataset to examples
- Styled reasoning sections with dark mode support
- Includes reasoning trace indicator badge in statistics panel
π€ Generated with [Claude Code](https://claude.ai/code)
Co-Authored-By: Claude <[email protected]>
- CLAUDE.md +89 -2
- css/styles.css +49 -0
- index.html +68 -2
- js/app.js +65 -3
- js/reasoning-parser.js +224 -0
- linkedin-post.txt +18 -0
- mobile-enhancement-plan.md +237 -0
- multi-ocr-comparison-ui-patterns.md +277 -0
CLAUDE.md
CHANGED
@@ -6,6 +6,32 @@ This file provides guidance to Claude Code (claude.ai/code) when working with th
|
|
6 |
|
7 |
OCR Text Explorer is a modern, standalone web application for browsing and comparing OCR text improvements in HuggingFace datasets. Built as a lightweight alternative to the Gradio-based OCR Time Machine, it focuses specifically on exploring pre-OCR'd datasets with enhanced user experience.
|
8 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
9 |
## Architecture
|
10 |
|
11 |
### Technology Stack
|
@@ -123,6 +149,23 @@ case 'your_key':
|
|
123 |
// Dark mode: bg-red-950, text-red-300
|
124 |
```
|
125 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
126 |
## Performance Optimizations
|
127 |
|
128 |
1. **Direct Dataset Indexing**: Uses `dataset[index]` instead of loading batches into memory
|
@@ -146,8 +189,33 @@ case 'your_key':
|
|
146 |
**Cause**: Signed URLs expire after ~1 hour
|
147 |
**Fix**: Implemented handleImageError() with automatic URL refresh
|
148 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
149 |
## Future Enhancements
|
150 |
|
|
|
151 |
- [ ] Search/filter within dataset
|
152 |
- [ ] Bookmark favorite samples
|
153 |
- [ ] Export selected texts
|
@@ -178,9 +246,28 @@ npx serve .
|
|
178 |
## Testing Datasets
|
179 |
|
180 |
Known working datasets:
|
181 |
-
- `davanstrien/exams-ocr` - Default dataset with
|
|
|
182 |
- Any dataset with image + text columns
|
183 |
|
184 |
Column patterns automatically detected:
|
185 |
- Original: `text`, `ocr`, `original_text`, `ground_truth`
|
186 |
-
- Improved: `markdown`, `new_ocr`, `corrected_text`, `vlm_ocr`
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
6 |
|
7 |
OCR Text Explorer is a modern, standalone web application for browsing and comparing OCR text improvements in HuggingFace datasets. Built as a lightweight alternative to the Gradio-based OCR Time Machine, it focuses specifically on exploring pre-OCR'd datasets with enhanced user experience.
|
8 |
|
9 |
+
## Recent Updates
|
10 |
+
|
11 |
+
### Markdown Rendering Support (Added 2025-08-01)
|
12 |
+
|
13 |
+
The application now supports rendering markdown-formatted VLM output for improved readability:
|
14 |
+
|
15 |
+
**Features:**
|
16 |
+
- Automatic markdown detection in improved OCR text
|
17 |
+
- Toggle button to switch between raw markdown and rendered view
|
18 |
+
- Support for common markdown elements: headers, lists, tables, code blocks, links
|
19 |
+
- Security-focused implementation with XSS prevention
|
20 |
+
- Performance optimization with render caching
|
21 |
+
|
22 |
+
**Implementation Details:**
|
23 |
+
- Uses marked.js library for markdown parsing
|
24 |
+
- Custom renderers for security (sanitizes URLs, prevents script injection)
|
25 |
+
- Tailwind-styled markdown elements matching the app's design
|
26 |
+
- HTML table support for VLM outputs that use table tags
|
27 |
+
- Cache system limits memory usage to 50 rendered items
|
28 |
+
|
29 |
+
**UI Changes:**
|
30 |
+
- Markdown toggle button appears when markdown is detected
|
31 |
+
- "Markdown Detected" badge in statistics panel
|
32 |
+
- New "Markdown Diff" mode showing plain vs rendered comparison
|
33 |
+
- Both "Improved Only" and "Side by Side" views support rendering
|
34 |
+
|
35 |
## Architecture
|
36 |
|
37 |
### Technology Stack
|
|
|
149 |
// Dark mode: bg-red-950, text-red-300
|
150 |
```
|
151 |
|
152 |
+
### Working with Markdown Rendering
|
153 |
+
```javascript
|
154 |
+
// Enable/disable markdown rendering
|
155 |
+
this.renderMarkdown = true; // Toggle markdown rendering
|
156 |
+
|
157 |
+
// Add new markdown patterns to detection
|
158 |
+
// In app.js detectMarkdown() method
|
159 |
+
const markdownPatterns = [
|
160 |
+
/your_pattern_here/, // Add your pattern
|
161 |
+
// ... existing patterns
|
162 |
+
];
|
163 |
+
|
164 |
+
// Customize markdown styles
|
165 |
+
// In app.js renderMarkdownText() method
|
166 |
+
html = html.replace(/<your_element>/g, '<your_element class="your-tailwind-classes">');
|
167 |
+
```
|
168 |
+
|
169 |
## Performance Optimizations
|
170 |
|
171 |
1. **Direct Dataset Indexing**: Uses `dataset[index]` instead of loading batches into memory
|
|
|
189 |
**Cause**: Signed URLs expire after ~1 hour
|
190 |
**Fix**: Implemented handleImageError() with automatic URL refresh
|
191 |
|
192 |
+
### Issue: Markdown tables not rendering
|
193 |
+
**Cause**: Default marked.js settings and HTML security restrictions
|
194 |
+
**Fix**:
|
195 |
+
- Enabled `tables: true` in marked.js options
|
196 |
+
- Added safe HTML table tag allowlist in renderer
|
197 |
+
- Applied proper Tailwind CSS classes to table elements
|
198 |
+
- Added CSS overrides for prose container compatibility
|
199 |
+
|
200 |
+
## Mobile Support Status
|
201 |
+
|
202 |
+
While the application claims responsive design, the current mobile support is limited. A comprehensive mobile enhancement is planned but not yet implemented. See [mobile-enhancement-plan.md](mobile-enhancement-plan.md) for detailed technical requirements and implementation approach.
|
203 |
+
|
204 |
+
**Current limitations:**
|
205 |
+
- Fixed desktop layout doesn't adapt well to small screens
|
206 |
+
- No touch gesture support for navigation
|
207 |
+
- Small touch targets for buttons and inputs
|
208 |
+
- Desktop-only interactions (hover states, keyboard shortcuts)
|
209 |
+
|
210 |
+
**Planned improvements:**
|
211 |
+
- Responsive stacked layout for mobile devices
|
212 |
+
- Touch gestures (swipe for navigation)
|
213 |
+
- Mobile-optimized navigation bar
|
214 |
+
- Touch-friendly UI components
|
215 |
+
|
216 |
## Future Enhancements
|
217 |
|
218 |
+
- [ ] Comprehensive mobile support (see mobile-enhancement-plan.md)
|
219 |
- [ ] Search/filter within dataset
|
220 |
- [ ] Bookmark favorite samples
|
221 |
- [ ] Export selected texts
|
|
|
246 |
## Testing Datasets
|
247 |
|
248 |
Known working datasets:
|
249 |
+
- `davanstrien/exams-ocr` - Default dataset with exam papers (uses `text` and `markdown` columns)
|
250 |
+
- `davanstrien/rolm-test` - Victorian theatre playbills processed with RolmOCR (uses `text` and `rolmocr_text` columns, includes `inference_info` metadata)
|
251 |
- Any dataset with image + text columns
|
252 |
|
253 |
Column patterns automatically detected:
|
254 |
- Original: `text`, `ocr`, `original_text`, `ground_truth`
|
255 |
+
- Improved: `markdown`, `new_ocr`, `corrected_text`, `vlm_ocr`, `rolmocr_text`
|
256 |
+
- Metadata: `inference_info` (JSON array with model details, processing date, parameters)
|
257 |
+
|
258 |
+
## Recent Updates
|
259 |
+
|
260 |
+
### Model Information Display (Added 2025-08-04)
|
261 |
+
|
262 |
+
The application now displays model processing information when available:
|
263 |
+
|
264 |
+
**Features:**
|
265 |
+
- Automatic detection of `inference_info` column
|
266 |
+
- Model metadata panel showing: model name, processing date, batch size, max tokens
|
267 |
+
- Link to processing script when available
|
268 |
+
- Positioned prominently below image for immediate visibility
|
269 |
+
|
270 |
+
**Implementation Notes:**
|
271 |
+
- The model info panel only appears when `inference_info` column exists
|
272 |
+
- Supports datasets processed with UV scripts via HF Jobs
|
273 |
+
- Gracefully handles datasets without model metadata
|
css/styles.css
CHANGED
@@ -48,6 +48,55 @@ body {
|
|
48 |
word-break: break-word;
|
49 |
}
|
50 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
51 |
/* Keyboard hint styling */
|
52 |
kbd {
|
53 |
@apply inline-block px-2 py-1 text-xs font-semibold text-gray-800 bg-gray-100 border border-gray-300 rounded dark:bg-gray-700 dark:text-gray-200 dark:border-gray-600;
|
|
|
48 |
word-break: break-word;
|
49 |
}
|
50 |
|
51 |
+
/* Reasoning trace styling */
|
52 |
+
.reasoning-panel {
|
53 |
+
@apply bg-gradient-to-r from-blue-50 to-indigo-50 dark:from-blue-950/20 dark:to-indigo-950/20;
|
54 |
+
@apply border-l-4 border-blue-500 dark:border-blue-400;
|
55 |
+
}
|
56 |
+
|
57 |
+
.reasoning-step {
|
58 |
+
@apply transition-all hover:bg-gray-50 dark:hover:bg-gray-800/50 rounded-md p-2 -m-2;
|
59 |
+
}
|
60 |
+
|
61 |
+
.reasoning-step-number {
|
62 |
+
@apply inline-flex items-center justify-center w-7 h-7;
|
63 |
+
@apply bg-gradient-to-br from-blue-500 to-indigo-600;
|
64 |
+
@apply text-white text-xs font-bold rounded-full;
|
65 |
+
@apply shadow-sm;
|
66 |
+
}
|
67 |
+
|
68 |
+
.reasoning-step-title {
|
69 |
+
@apply font-semibold text-gray-900 dark:text-gray-100;
|
70 |
+
@apply border-b border-gray-200 dark:border-gray-700 pb-1 mb-2;
|
71 |
+
}
|
72 |
+
|
73 |
+
.reasoning-step-content {
|
74 |
+
@apply text-sm text-gray-700 dark:text-gray-300;
|
75 |
+
@apply leading-relaxed;
|
76 |
+
}
|
77 |
+
|
78 |
+
/* Collapse animation for reasoning panel */
|
79 |
+
[x-collapse] {
|
80 |
+
overflow: hidden;
|
81 |
+
transition: max-height 0.3s ease-out;
|
82 |
+
}
|
83 |
+
|
84 |
+
[x-collapse].collapsed {
|
85 |
+
max-height: 0;
|
86 |
+
}
|
87 |
+
|
88 |
+
/* Reasoning trace indicators */
|
89 |
+
.reasoning-indicator {
|
90 |
+
@apply animate-pulse;
|
91 |
+
}
|
92 |
+
|
93 |
+
.reasoning-badge {
|
94 |
+
@apply inline-flex items-center px-3 py-1 rounded-full text-xs font-medium;
|
95 |
+
@apply bg-gradient-to-r from-blue-100 to-indigo-100 dark:from-blue-900 dark:to-indigo-900;
|
96 |
+
@apply text-blue-800 dark:text-blue-200;
|
97 |
+
@apply border border-blue-200 dark:border-blue-700;
|
98 |
+
}
|
99 |
+
|
100 |
/* Keyboard hint styling */
|
101 |
kbd {
|
102 |
@apply inline-block px-2 py-1 text-xs font-semibold text-gray-800 bg-gray-100 border border-gray-300 rounded dark:bg-gray-700 dark:text-gray-200 dark:border-gray-600;
|
index.html
CHANGED
@@ -314,13 +314,19 @@
|
|
314 |
<span x-text="wordStats.original || '-'"></span> β <span x-text="wordStats.improved || '-'"></span>
|
315 |
</span>
|
316 |
</div>
|
317 |
-
<div
|
318 |
-
<span class="inline-flex items-center px-2.5 py-0.5 rounded-full text-xs font-medium bg-purple-100 dark:bg-purple-900 text-purple-800 dark:text-purple-200">
|
319 |
<svg class="w-3 h-3 mr-1" fill="none" stroke="currentColor" viewBox="0 0 24 24">
|
320 |
<path stroke-linecap="round" stroke-linejoin="round" stroke-width="2" d="M9 12h6m-6 4h6m2 5H7a2 2 0 01-2-2V5a2 2 0 012-2h5.586a1 1 0 01.707.293l5.414 5.414a1 1 0 01.293.707V19a2 2 0 01-2 2z"></path>
|
321 |
</svg>
|
322 |
Markdown Detected
|
323 |
</span>
|
|
|
|
|
|
|
|
|
|
|
|
|
324 |
</div>
|
325 |
</div>
|
326 |
</div>
|
@@ -390,6 +396,65 @@
|
|
390 |
|
391 |
<!-- Improved Only -->
|
392 |
<div x-show="activeTab === 'improved'" class="max-w-none">
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
393 |
<div x-show="!renderMarkdown">
|
394 |
<pre class="whitespace-pre-wrap font-mono text-xs bg-gray-50 dark:bg-gray-800 text-gray-900 dark:text-gray-100 p-4 rounded-lg" x-text="getImprovedText()"></pre>
|
395 |
</div>
|
@@ -532,6 +597,7 @@
|
|
532 |
<!-- Local Scripts -->
|
533 |
<script src="js/diff-utils.js"></script>
|
534 |
<script src="js/dataset-api.js"></script>
|
|
|
535 |
<script src="js/app.js"></script>
|
536 |
</body>
|
537 |
</html>
|
|
|
314 |
<span x-text="wordStats.original || '-'"></span> β <span x-text="wordStats.improved || '-'"></span>
|
315 |
</span>
|
316 |
</div>
|
317 |
+
<div class="mt-2 flex items-center justify-center space-x-2">
|
318 |
+
<span x-show="hasMarkdown" class="inline-flex items-center px-2.5 py-0.5 rounded-full text-xs font-medium bg-purple-100 dark:bg-purple-900 text-purple-800 dark:text-purple-200">
|
319 |
<svg class="w-3 h-3 mr-1" fill="none" stroke="currentColor" viewBox="0 0 24 24">
|
320 |
<path stroke-linecap="round" stroke-linejoin="round" stroke-width="2" d="M9 12h6m-6 4h6m2 5H7a2 2 0 01-2-2V5a2 2 0 012-2h5.586a1 1 0 01.707.293l5.414 5.414a1 1 0 01.293.707V19a2 2 0 01-2 2z"></path>
|
321 |
</svg>
|
322 |
Markdown Detected
|
323 |
</span>
|
324 |
+
<span x-show="hasReasoningTrace" class="inline-flex items-center px-2.5 py-0.5 rounded-full text-xs font-medium bg-blue-100 dark:bg-blue-900 text-blue-800 dark:text-blue-200">
|
325 |
+
<svg class="w-3 h-3 mr-1" fill="none" stroke="currentColor" viewBox="0 0 24 24">
|
326 |
+
<path stroke-linecap="round" stroke-linejoin="round" stroke-width="2" d="M9.663 17h4.673M12 3v1m6.364 1.636l-.707.707M21 12h-1M4 12H3m3.343-5.657l-.707-.707m2.828 9.9a5 5 0 117.072 0l-.548.547A3.374 3.374 0 0014 18.469V19a2 2 0 11-4 0v-.531c0-.895-.356-1.754-.988-2.386l-.548-.547z"></path>
|
327 |
+
</svg>
|
328 |
+
Reasoning Trace
|
329 |
+
</span>
|
330 |
</div>
|
331 |
</div>
|
332 |
</div>
|
|
|
396 |
|
397 |
<!-- Improved Only -->
|
398 |
<div x-show="activeTab === 'improved'" class="max-w-none">
|
399 |
+
<!-- Reasoning Trace Panel -->
|
400 |
+
<div x-show="hasReasoningTrace" class="mb-4">
|
401 |
+
<div class="bg-blue-50 dark:bg-blue-950/20 border border-blue-200 dark:border-blue-800 rounded-lg">
|
402 |
+
<button
|
403 |
+
@click="showReasoning = !showReasoning"
|
404 |
+
class="w-full px-4 py-3 flex items-center justify-between text-left hover:bg-blue-100 dark:hover:bg-blue-950/40 transition-colors rounded-t-lg"
|
405 |
+
>
|
406 |
+
<div class="flex items-center space-x-2">
|
407 |
+
<svg class="w-5 h-5 text-blue-600 dark:text-blue-400" fill="none" stroke="currentColor" viewBox="0 0 24 24">
|
408 |
+
<path stroke-linecap="round" stroke-linejoin="round" stroke-width="2" d="M9.663 17h4.673M12 3v1m6.364 1.636l-.707.707M21 12h-1M4 12H3m3.343-5.657l-.707-.707m2.828 9.9a5 5 0 117.072 0l-.548.547A3.374 3.374 0 0014 18.469V19a2 2 0 11-4 0v-.531c0-.895-.356-1.754-.988-2.386l-.548-.547z"></path>
|
409 |
+
</svg>
|
410 |
+
<span class="font-medium text-gray-900 dark:text-gray-100">Model Reasoning</span>
|
411 |
+
<span class="text-sm text-gray-600 dark:text-gray-400" x-show="reasoningStats">
|
412 |
+
(<span x-text="reasoningStats?.reasoningWords"></span> words, <span x-text="reasoningStats?.reasoningRatio"></span>% of output)
|
413 |
+
</span>
|
414 |
+
</div>
|
415 |
+
<svg
|
416 |
+
class="w-5 h-5 text-gray-500 dark:text-gray-400 transition-transform"
|
417 |
+
:class="showReasoning ? 'rotate-180' : ''"
|
418 |
+
fill="none" stroke="currentColor" viewBox="0 0 24 24"
|
419 |
+
>
|
420 |
+
<path stroke-linecap="round" stroke-linejoin="round" stroke-width="2" d="M19 9l-7 7-7-7"></path>
|
421 |
+
</svg>
|
422 |
+
</button>
|
423 |
+
|
424 |
+
<div x-show="showReasoning" x-collapse class="px-4 pb-4">
|
425 |
+
<div class="bg-white dark:bg-gray-800 rounded-lg p-4 mt-2">
|
426 |
+
<template x-if="formattedReasoning && formattedReasoning.steps.length > 0">
|
427 |
+
<div class="space-y-3">
|
428 |
+
<template x-for="(step, index) in formattedReasoning.steps" :key="index">
|
429 |
+
<div class="pl-4 border-l-2 border-gray-200 dark:border-gray-700">
|
430 |
+
<div class="font-medium text-sm text-gray-900 dark:text-gray-100 mb-1">
|
431 |
+
<span class="inline-block w-6 h-6 bg-blue-100 dark:bg-blue-900 text-blue-600 dark:text-blue-400 rounded-full text-center text-xs leading-6 mr-2" x-text="step.number || (index + 1)"></span>
|
432 |
+
<span x-text="step.title"></span>
|
433 |
+
</div>
|
434 |
+
<div class="text-sm text-gray-700 dark:text-gray-300 whitespace-pre-wrap" x-text="step.content"></div>
|
435 |
+
</div>
|
436 |
+
</template>
|
437 |
+
</div>
|
438 |
+
</template>
|
439 |
+
|
440 |
+
<template x-if="!formattedReasoning || formattedReasoning.steps.length === 0">
|
441 |
+
<pre class="whitespace-pre-wrap font-mono text-xs text-gray-700 dark:text-gray-300" x-text="reasoningContent"></pre>
|
442 |
+
</template>
|
443 |
+
</div>
|
444 |
+
</div>
|
445 |
+
</div>
|
446 |
+
</div>
|
447 |
+
|
448 |
+
<!-- Final Answer Content -->
|
449 |
+
<div x-show="hasReasoningTrace" class="mb-2">
|
450 |
+
<div class="flex items-center space-x-2 text-sm text-gray-600 dark:text-gray-400 mb-2">
|
451 |
+
<svg class="w-4 h-4" fill="none" stroke="currentColor" viewBox="0 0 24 24">
|
452 |
+
<path stroke-linecap="round" stroke-linejoin="round" stroke-width="2" d="M9 12l2 2 4-4m6 2a9 9 0 11-18 0 9 9 0 0118 0z"></path>
|
453 |
+
</svg>
|
454 |
+
<span>Final Output</span>
|
455 |
+
</div>
|
456 |
+
</div>
|
457 |
+
|
458 |
<div x-show="!renderMarkdown">
|
459 |
<pre class="whitespace-pre-wrap font-mono text-xs bg-gray-50 dark:bg-gray-800 text-gray-900 dark:text-gray-100 p-4 rounded-lg" x-text="getImprovedText()"></pre>
|
460 |
</div>
|
|
|
597 |
<!-- Local Scripts -->
|
598 |
<script src="js/diff-utils.js"></script>
|
599 |
<script src="js/dataset-api.js"></script>
|
600 |
+
<script src="js/reasoning-parser.js"></script>
|
601 |
<script src="js/app.js"></script>
|
602 |
</body>
|
603 |
</html>
|
js/app.js
CHANGED
@@ -12,7 +12,8 @@ document.addEventListener('alpine:init', () => {
|
|
12 |
// Example datasets
|
13 |
exampleDatasets: [
|
14 |
{ id: 'davanstrien/exams-ocr', name: 'Exams OCR', description: 'Historical exam papers with VLM corrections' },
|
15 |
-
{ id: 'davanstrien/rolm-test', name: 'ROLM Test', description: 'Documents processed with RolmOCR model' }
|
|
|
16 |
],
|
17 |
|
18 |
// Navigation state
|
@@ -33,6 +34,14 @@ document.addEventListener('alpine:init', () => {
|
|
33 |
renderMarkdown: false,
|
34 |
hasMarkdown: false,
|
35 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
36 |
// Flow view state
|
37 |
flowItems: [],
|
38 |
flowStartIndex: 0,
|
@@ -190,9 +199,10 @@ document.addEventListener('alpine:init', () => {
|
|
190 |
console.log('Column info:', this.columnInfo);
|
191 |
console.log('Current sample keys:', Object.keys(this.currentSample));
|
192 |
|
193 |
-
// Check if improved text contains markdown
|
194 |
const improvedText = this.getImprovedText();
|
195 |
-
this.
|
|
|
196 |
|
197 |
// Update diff when sample changes
|
198 |
this.updateDiff();
|
@@ -279,6 +289,38 @@ document.addEventListener('alpine:init', () => {
|
|
279 |
};
|
280 |
},
|
281 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
282 |
getOriginalText() {
|
283 |
if (!this.currentSample) return '';
|
284 |
const columns = this.api.detectColumns(null, this.currentSample);
|
@@ -286,6 +328,17 @@ document.addEventListener('alpine:init', () => {
|
|
286 |
},
|
287 |
|
288 |
getImprovedText() {
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
289 |
if (!this.currentSample) return '';
|
290 |
const columns = this.api.detectColumns(null, this.currentSample);
|
291 |
return this.currentSample[columns.improvedText] || 'No improved text found';
|
@@ -564,6 +617,15 @@ document.addEventListener('alpine:init', () => {
|
|
564 |
content += `${'='.repeat(50)}\n`;
|
565 |
content += original;
|
566 |
content += `\n\n${'='.repeat(50)}\n\n`;
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
567 |
content += `IMPROVED OCR:\n`;
|
568 |
content += `${'='.repeat(50)}\n`;
|
569 |
content += improved;
|
|
|
12 |
// Example datasets
|
13 |
exampleDatasets: [
|
14 |
{ id: 'davanstrien/exams-ocr', name: 'Exams OCR', description: 'Historical exam papers with VLM corrections' },
|
15 |
+
{ id: 'davanstrien/rolm-test', name: 'ROLM Test', description: 'Documents processed with RolmOCR model' },
|
16 |
+
{ id: 'davanstrien/india-medical-ocr-test', name: 'India Medical OCR', description: 'Medical documents with NuMarkdown reasoning traces' }
|
17 |
],
|
18 |
|
19 |
// Navigation state
|
|
|
34 |
renderMarkdown: false,
|
35 |
hasMarkdown: false,
|
36 |
|
37 |
+
// Reasoning trace state
|
38 |
+
hasReasoningTrace: false,
|
39 |
+
showReasoning: false,
|
40 |
+
reasoningContent: null,
|
41 |
+
answerContent: null,
|
42 |
+
reasoningStats: null,
|
43 |
+
formattedReasoning: null,
|
44 |
+
|
45 |
// Flow view state
|
46 |
flowItems: [],
|
47 |
flowStartIndex: 0,
|
|
|
199 |
console.log('Column info:', this.columnInfo);
|
200 |
console.log('Current sample keys:', Object.keys(this.currentSample));
|
201 |
|
202 |
+
// Check if improved text contains markdown and reasoning traces
|
203 |
const improvedText = this.getImprovedText();
|
204 |
+
this.parseReasoningTrace(improvedText);
|
205 |
+
this.hasMarkdown = this.detectMarkdown(this.answerContent || improvedText);
|
206 |
|
207 |
// Update diff when sample changes
|
208 |
this.updateDiff();
|
|
|
289 |
};
|
290 |
},
|
291 |
|
292 |
+
parseReasoningTrace(text) {
|
293 |
+
// Reset reasoning state
|
294 |
+
this.hasReasoningTrace = false;
|
295 |
+
this.reasoningContent = null;
|
296 |
+
this.answerContent = null;
|
297 |
+
this.reasoningStats = null;
|
298 |
+
this.formattedReasoning = null;
|
299 |
+
|
300 |
+
if (!text || !window.ReasoningParser) return;
|
301 |
+
|
302 |
+
// Check if text contains reasoning trace
|
303 |
+
if (ReasoningParser.detectReasoningTrace(text)) {
|
304 |
+
const parsed = ReasoningParser.parseReasoningContent(text);
|
305 |
+
|
306 |
+
if (parsed.hasReasoning) {
|
307 |
+
this.hasReasoningTrace = true;
|
308 |
+
this.reasoningContent = parsed.reasoning;
|
309 |
+
this.answerContent = parsed.answer;
|
310 |
+
this.formattedReasoning = ReasoningParser.formatReasoningSteps(parsed.reasoning);
|
311 |
+
this.reasoningStats = ReasoningParser.getReasoningStats(parsed);
|
312 |
+
|
313 |
+
console.log('Reasoning trace detected:', this.reasoningStats);
|
314 |
+
} else {
|
315 |
+
// No reasoning found, use original text as answer
|
316 |
+
this.answerContent = text;
|
317 |
+
}
|
318 |
+
} else {
|
319 |
+
// No reasoning markers, use original text
|
320 |
+
this.answerContent = text;
|
321 |
+
}
|
322 |
+
},
|
323 |
+
|
324 |
getOriginalText() {
|
325 |
if (!this.currentSample) return '';
|
326 |
const columns = this.api.detectColumns(null, this.currentSample);
|
|
|
328 |
},
|
329 |
|
330 |
getImprovedText() {
|
331 |
+
if (!this.currentSample) return '';
|
332 |
+
const columns = this.api.detectColumns(null, this.currentSample);
|
333 |
+
const rawText = this.currentSample[columns.improvedText] || 'No improved text found';
|
334 |
+
|
335 |
+
// If we have parsed answer content from reasoning trace, use that
|
336 |
+
// Otherwise return the raw text
|
337 |
+
return this.hasReasoningTrace && this.answerContent ? this.answerContent : rawText;
|
338 |
+
},
|
339 |
+
|
340 |
+
getRawImprovedText() {
|
341 |
+
// Get the raw improved text without parsing reasoning traces
|
342 |
if (!this.currentSample) return '';
|
343 |
const columns = this.api.detectColumns(null, this.currentSample);
|
344 |
return this.currentSample[columns.improvedText] || 'No improved text found';
|
|
|
617 |
content += `${'='.repeat(50)}\n`;
|
618 |
content += original;
|
619 |
content += `\n\n${'='.repeat(50)}\n\n`;
|
620 |
+
|
621 |
+
// Include reasoning trace if available
|
622 |
+
if (this.hasReasoningTrace && this.reasoningContent) {
|
623 |
+
content += `MODEL REASONING:\n`;
|
624 |
+
content += `${'='.repeat(50)}\n`;
|
625 |
+
content += this.reasoningContent;
|
626 |
+
content += `\n\n${'='.repeat(50)}\n\n`;
|
627 |
+
}
|
628 |
+
|
629 |
content += `IMPROVED OCR:\n`;
|
630 |
content += `${'='.repeat(50)}\n`;
|
631 |
content += improved;
|
js/reasoning-parser.js
ADDED
@@ -0,0 +1,224 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
/**
|
2 |
+
* Reasoning Trace Parser
|
3 |
+
* Handles parsing and formatting of model reasoning traces from OCR outputs
|
4 |
+
*/
|
5 |
+
|
6 |
+
class ReasoningParser {
|
7 |
+
/**
|
8 |
+
* Detect if text contains reasoning trace markers
|
9 |
+
* @param {string} text - The text to check
|
10 |
+
* @returns {boolean} - True if reasoning trace is detected
|
11 |
+
*/
|
12 |
+
static detectReasoningTrace(text) {
|
13 |
+
if (!text || typeof text !== 'string') return false;
|
14 |
+
|
15 |
+
// Check for common reasoning trace patterns
|
16 |
+
const patterns = [
|
17 |
+
/<think>/i,
|
18 |
+
/<thinking>/i,
|
19 |
+
/<reasoning>/i,
|
20 |
+
/<thought>/i
|
21 |
+
];
|
22 |
+
|
23 |
+
return patterns.some(pattern => pattern.test(text));
|
24 |
+
}
|
25 |
+
|
26 |
+
/**
|
27 |
+
* Parse reasoning content from text
|
28 |
+
* @param {string} text - The text containing reasoning trace
|
29 |
+
* @returns {object} - Object with reasoning and answer sections
|
30 |
+
*/
|
31 |
+
static parseReasoningContent(text) {
|
32 |
+
if (!text) {
|
33 |
+
return { reasoning: null, answer: null, original: text };
|
34 |
+
}
|
35 |
+
|
36 |
+
// Try multiple patterns for flexibility
|
37 |
+
const patterns = [
|
38 |
+
{
|
39 |
+
start: /<think>/i,
|
40 |
+
end: /<\/think>/i,
|
41 |
+
answerStart: /<answer>/i,
|
42 |
+
answerEnd: /<\/answer>/i
|
43 |
+
},
|
44 |
+
{
|
45 |
+
start: /<thinking>/i,
|
46 |
+
end: /<\/thinking>/i,
|
47 |
+
answerStart: /<answer>/i,
|
48 |
+
answerEnd: /<\/answer>/i
|
49 |
+
},
|
50 |
+
{
|
51 |
+
start: /<reasoning>/i,
|
52 |
+
end: /<\/reasoning>/i,
|
53 |
+
answerStart: /<output>/i,
|
54 |
+
answerEnd: /<\/output>/i
|
55 |
+
}
|
56 |
+
];
|
57 |
+
|
58 |
+
for (const pattern of patterns) {
|
59 |
+
const reasoningMatch = text.match(new RegExp(
|
60 |
+
pattern.start.source + '([\\s\\S]*?)' + pattern.end.source,
|
61 |
+
'i'
|
62 |
+
));
|
63 |
+
|
64 |
+
const answerMatch = text.match(new RegExp(
|
65 |
+
pattern.answerStart.source + '([\\s\\S]*?)' + pattern.answerEnd.source,
|
66 |
+
'i'
|
67 |
+
));
|
68 |
+
|
69 |
+
if (reasoningMatch || answerMatch) {
|
70 |
+
return {
|
71 |
+
reasoning: reasoningMatch ? reasoningMatch[1].trim() : null,
|
72 |
+
answer: answerMatch ? answerMatch[1].trim() : null,
|
73 |
+
hasReasoning: !!reasoningMatch,
|
74 |
+
hasAnswer: !!answerMatch,
|
75 |
+
original: text
|
76 |
+
};
|
77 |
+
}
|
78 |
+
}
|
79 |
+
|
80 |
+
// If no patterns match, return original text as answer
|
81 |
+
return {
|
82 |
+
reasoning: null,
|
83 |
+
answer: text,
|
84 |
+
hasReasoning: false,
|
85 |
+
hasAnswer: true,
|
86 |
+
original: text
|
87 |
+
};
|
88 |
+
}
|
89 |
+
|
90 |
+
/**
|
91 |
+
* Format reasoning steps for display
|
92 |
+
* @param {string} reasoningText - The raw reasoning text
|
93 |
+
* @returns {object} - Formatted reasoning with steps and metadata
|
94 |
+
*/
|
95 |
+
static formatReasoningSteps(reasoningText) {
|
96 |
+
if (!reasoningText) return null;
|
97 |
+
|
98 |
+
// Parse numbered steps (e.g., "1. Step content")
|
99 |
+
const stepPattern = /^\d+\.\s+\*\*(.+?)\*\*(.+?)(?=^\d+\.\s|\z)/gms;
|
100 |
+
const steps = [];
|
101 |
+
let match;
|
102 |
+
|
103 |
+
while ((match = stepPattern.exec(reasoningText)) !== null) {
|
104 |
+
steps.push({
|
105 |
+
title: match[1].trim(),
|
106 |
+
content: match[2].trim()
|
107 |
+
});
|
108 |
+
}
|
109 |
+
|
110 |
+
// If no numbered steps found, try to parse by line breaks
|
111 |
+
if (steps.length === 0) {
|
112 |
+
const lines = reasoningText.split('\n').filter(line => line.trim());
|
113 |
+
lines.forEach((line, index) => {
|
114 |
+
// Check if line starts with a number
|
115 |
+
const numberedMatch = line.match(/^(\d+)\.\s*(.+)/);
|
116 |
+
if (numberedMatch) {
|
117 |
+
const title = numberedMatch[2].replace(/\*\*/g, '').trim();
|
118 |
+
steps.push({
|
119 |
+
number: numberedMatch[1],
|
120 |
+
title: title,
|
121 |
+
content: ''
|
122 |
+
});
|
123 |
+
} else if (steps.length > 0) {
|
124 |
+
// Add to previous step's content
|
125 |
+
steps[steps.length - 1].content += '\n' + line;
|
126 |
+
}
|
127 |
+
});
|
128 |
+
}
|
129 |
+
|
130 |
+
return {
|
131 |
+
steps: steps,
|
132 |
+
rawText: reasoningText,
|
133 |
+
stepCount: steps.length,
|
134 |
+
characterCount: reasoningText.length,
|
135 |
+
wordCount: reasoningText.split(/\s+/).filter(w => w).length
|
136 |
+
};
|
137 |
+
}
|
138 |
+
|
139 |
+
/**
|
140 |
+
* Extract key insights from reasoning
|
141 |
+
* @param {string} reasoningText - The reasoning text
|
142 |
+
* @returns {array} - Array of key insights or decisions
|
143 |
+
*/
|
144 |
+
static extractInsights(reasoningText) {
|
145 |
+
if (!reasoningText) return [];
|
146 |
+
|
147 |
+
const insights = [];
|
148 |
+
|
149 |
+
// Look for decision points and key observations
|
150 |
+
const patterns = [
|
151 |
+
/decision:\s*(.+)/gi,
|
152 |
+
/observation:\s*(.+)/gi,
|
153 |
+
/note:\s*(.+)/gi,
|
154 |
+
/important:\s*(.+)/gi,
|
155 |
+
/key finding:\s*(.+)/gi
|
156 |
+
];
|
157 |
+
|
158 |
+
patterns.forEach(pattern => {
|
159 |
+
let match;
|
160 |
+
while ((match = pattern.exec(reasoningText)) !== null) {
|
161 |
+
insights.push(match[1].trim());
|
162 |
+
}
|
163 |
+
});
|
164 |
+
|
165 |
+
return insights;
|
166 |
+
}
|
167 |
+
|
168 |
+
/**
|
169 |
+
* Get summary statistics about the reasoning trace
|
170 |
+
* @param {object} parsedContent - Parsed reasoning content
|
171 |
+
* @returns {object} - Statistics about the reasoning
|
172 |
+
*/
|
173 |
+
static getReasoningStats(parsedContent) {
|
174 |
+
if (!parsedContent || !parsedContent.reasoning) {
|
175 |
+
return {
|
176 |
+
hasReasoning: false,
|
177 |
+
reasoningLength: 0,
|
178 |
+
answerLength: 0,
|
179 |
+
reasoningRatio: 0
|
180 |
+
};
|
181 |
+
}
|
182 |
+
|
183 |
+
const reasoningLength = parsedContent.reasoning.length;
|
184 |
+
const answerLength = parsedContent.answer ? parsedContent.answer.length : 0;
|
185 |
+
const totalLength = reasoningLength + answerLength;
|
186 |
+
|
187 |
+
return {
|
188 |
+
hasReasoning: true,
|
189 |
+
reasoningLength: reasoningLength,
|
190 |
+
answerLength: answerLength,
|
191 |
+
totalLength: totalLength,
|
192 |
+
reasoningRatio: totalLength > 0 ? (reasoningLength / totalLength * 100).toFixed(1) : 0,
|
193 |
+
reasoningWords: parsedContent.reasoning.split(/\s+/).filter(w => w).length,
|
194 |
+
answerWords: parsedContent.answer ? parsedContent.answer.split(/\s+/).filter(w => w).length : 0
|
195 |
+
};
|
196 |
+
}
|
197 |
+
|
198 |
+
/**
|
199 |
+
* Format reasoning for export
|
200 |
+
* @param {object} parsedContent - Parsed reasoning content
|
201 |
+
* @param {boolean} includeReasoning - Whether to include reasoning in export
|
202 |
+
* @returns {string} - Formatted text for export
|
203 |
+
*/
|
204 |
+
static formatForExport(parsedContent, includeReasoning = true) {
|
205 |
+
if (!parsedContent) return '';
|
206 |
+
|
207 |
+
let exportText = '';
|
208 |
+
|
209 |
+
if (includeReasoning && parsedContent.reasoning) {
|
210 |
+
exportText += '=== MODEL REASONING ===\n\n';
|
211 |
+
exportText += parsedContent.reasoning;
|
212 |
+
exportText += '\n\n=== FINAL OUTPUT ===\n\n';
|
213 |
+
}
|
214 |
+
|
215 |
+
if (parsedContent.answer) {
|
216 |
+
exportText += parsedContent.answer;
|
217 |
+
}
|
218 |
+
|
219 |
+
return exportText;
|
220 |
+
}
|
221 |
+
}
|
222 |
+
|
223 |
+
// Export for use in other scripts
|
224 |
+
window.ReasoningParser = ReasoningParser;
|
linkedin-post.txt
ADDED
@@ -0,0 +1,18 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
How well do VLM-based OCR models handle Victorian theatre playbills? π
|
2 |
+
|
3 |
+
Last week I shared OCR Time Capsule for comparing traditional vs VLM-based OCR. I've now added some examples from challenging collections: The British Library's Theatrical playbills from Britain and Ireland collection.
|
4 |
+
|
5 |
+
These 150-year-old documents are brutal for OCR:
|
6 |
+
- Decorative fonts in every size imaginable
|
7 |
+
- Multi-column layouts with text at odd angles
|
8 |
+
- Faded ink and show-through from the reverse
|
9 |
+
- ALL CAPS DRAMATIC ANNOUNCEMENTS!!!
|
10 |
+
|
11 |
+
For this dataset I used the RolmOCR model from Reducto (processed via HF Jobs - love how easy UV scripts make GPU inference!). The results? The improvements over traditional OCR are even more dramatic than with exam papers.
|
12 |
+
|
13 |
+
π Explore the app: https://huggingface.co/spaces/davanstrien/ocr-time-capsule
|
14 |
+
π BL Theatre dataset: https://bl.iro.bl.uk/concern/datasets/a8534aff-c8e3-4fc8-adc1-da542080b1e3
|
15 |
+
|
16 |
+
I'll continue to work through the suggestions I got last week but feel free to suggest other hairy OCR challenges to compare VLMs vs existing OCR!
|
17 |
+
|
18 |
+
#DigitalHumanities #OCR #GLAM #BritishLibrary #TheatreHistory
|
mobile-enhancement-plan.md
ADDED
@@ -0,0 +1,237 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# Mobile Enhancement Plan for OCR Time Capsule
|
2 |
+
|
3 |
+
## Overview
|
4 |
+
|
5 |
+
This document outlines the technical requirements for implementing comprehensive mobile support in OCR Time Capsule. While the application claims mobile support, the current implementation has significant limitations that prevent a good mobile user experience.
|
6 |
+
|
7 |
+
**Estimated Effort:** 800-1,200 lines of code changes
|
8 |
+
**Complexity:** Medium-High
|
9 |
+
**Development Time:** 3-5 days for full implementation, 2 days for MVP
|
10 |
+
|
11 |
+
## Current Mobile Limitations
|
12 |
+
|
13 |
+
1. **Fixed desktop layout** - Rigid 1/3 + 2/3 split doesn't adapt to small screens
|
14 |
+
2. **No touch support** - Navigation relies entirely on keyboard shortcuts
|
15 |
+
3. **Fixed positioning issues** - Footer overlaps content on mobile browsers
|
16 |
+
4. **Small touch targets** - Buttons/inputs too small for finger interaction
|
17 |
+
5. **Desktop-only interactions** - Hover states, dropdown menus not touch-friendly
|
18 |
+
6. **Overflow problems** - Content gets cut off due to fixed heights
|
19 |
+
|
20 |
+
## Required Changes
|
21 |
+
|
22 |
+
### 1. Layout Restructuring (Critical)
|
23 |
+
|
24 |
+
**Current:** Fixed side-by-side layout
|
25 |
+
```html
|
26 |
+
<!-- Current structure -->
|
27 |
+
<div class="flex-1 flex h-full">
|
28 |
+
<div class="w-1/3">...</div> <!-- Image panel -->
|
29 |
+
<div class="flex-1">...</div> <!-- Text panel -->
|
30 |
+
</div>
|
31 |
+
```
|
32 |
+
|
33 |
+
**Required:** Responsive stacked layout
|
34 |
+
```html
|
35 |
+
<!-- Mobile-first approach -->
|
36 |
+
<div class="flex flex-col md:flex-row h-full">
|
37 |
+
<div class="w-full md:w-1/3">...</div>
|
38 |
+
<div class="w-full md:flex-1">...</div>
|
39 |
+
</div>
|
40 |
+
```
|
41 |
+
|
42 |
+
**Changes needed:**
|
43 |
+
- Update all layout containers in `index.html` (~50 lines)
|
44 |
+
- Add mobile-specific CSS classes (~100 lines)
|
45 |
+
- Implement collapsible image panel for mobile
|
46 |
+
|
47 |
+
### 2. Touch Navigation Implementation
|
48 |
+
|
49 |
+
**New JavaScript required in `app.js`:**
|
50 |
+
```javascript
|
51 |
+
// Touch gesture handling
|
52 |
+
let touchStartX = 0;
|
53 |
+
let touchEndX = 0;
|
54 |
+
|
55 |
+
initTouchNavigation() {
|
56 |
+
const container = document.getElementById('main-content');
|
57 |
+
|
58 |
+
container.addEventListener('touchstart', (e) => {
|
59 |
+
touchStartX = e.changedTouches[0].screenX;
|
60 |
+
});
|
61 |
+
|
62 |
+
container.addEventListener('touchend', (e) => {
|
63 |
+
touchEndX = e.changedTouches[0].screenX;
|
64 |
+
this.handleSwipe();
|
65 |
+
});
|
66 |
+
}
|
67 |
+
|
68 |
+
handleSwipe() {
|
69 |
+
const swipeThreshold = 50;
|
70 |
+
const diff = touchStartX - touchEndX;
|
71 |
+
|
72 |
+
if (Math.abs(diff) > swipeThreshold) {
|
73 |
+
if (diff > 0) {
|
74 |
+
this.nextSample(); // Swipe left
|
75 |
+
} else {
|
76 |
+
this.previousSample(); // Swipe right
|
77 |
+
}
|
78 |
+
}
|
79 |
+
}
|
80 |
+
```
|
81 |
+
|
82 |
+
**Scope:** ~150 lines for complete touch support including:
|
83 |
+
- Swipe detection
|
84 |
+
- Touch feedback
|
85 |
+
- Gesture velocity calculation
|
86 |
+
- Preventing accidental triggers
|
87 |
+
|
88 |
+
### 3. Mobile Navigation UI
|
89 |
+
|
90 |
+
**Replace fixed footer with mobile-friendly navigation:**
|
91 |
+
```html
|
92 |
+
<!-- Mobile navigation bar -->
|
93 |
+
<nav class="md:hidden fixed bottom-0 left-0 right-0 bg-white dark:bg-gray-800 border-t">
|
94 |
+
<div class="grid grid-cols-3 h-16">
|
95 |
+
<button class="flex items-center justify-center" @click="previousSample()">
|
96 |
+
<svg class="w-8 h-8">...</svg>
|
97 |
+
</button>
|
98 |
+
<button class="flex items-center justify-center" @click="showPageSelector = true">
|
99 |
+
<span class="text-lg font-medium" x-text="`${currentIndex + 1}/${totalSamples}`"></span>
|
100 |
+
</button>
|
101 |
+
<button class="flex items-center justify-center" @click="nextSample()">
|
102 |
+
<svg class="w-8 h-8">...</svg>
|
103 |
+
</button>
|
104 |
+
</div>
|
105 |
+
</nav>
|
106 |
+
```
|
107 |
+
|
108 |
+
**Changes:** ~100 lines for navigation components
|
109 |
+
|
110 |
+
### 4. Touch-Friendly Components
|
111 |
+
|
112 |
+
**Update all interactive elements:**
|
113 |
+
- Minimum touch target size: 44x44px
|
114 |
+
- Add `touch-action` CSS properties
|
115 |
+
- Increase padding on all buttons
|
116 |
+
- Replace hover menus with tap-to-open modals
|
117 |
+
|
118 |
+
**Example button update:**
|
119 |
+
```html
|
120 |
+
<!-- Before -->
|
121 |
+
<button class="px-2 py-1 text-sm">Load</button>
|
122 |
+
|
123 |
+
<!-- After -->
|
124 |
+
<button class="px-4 py-3 md:px-2 md:py-1 text-base md:text-sm min-w-[44px] min-h-[44px] md:min-w-0 md:min-h-0">
|
125 |
+
Load
|
126 |
+
</button>
|
127 |
+
```
|
128 |
+
|
129 |
+
### 5. Mobile Dock/Gallery
|
130 |
+
|
131 |
+
**Transform desktop dock to mobile carousel:**
|
132 |
+
```javascript
|
133 |
+
// Mobile-optimized thumbnail gallery
|
134 |
+
initMobileGallery() {
|
135 |
+
this.mobileGallery = {
|
136 |
+
currentIndex: 0,
|
137 |
+
itemsPerView: 3,
|
138 |
+
thumbnails: []
|
139 |
+
};
|
140 |
+
|
141 |
+
// Horizontal scroll with snap points
|
142 |
+
const gallery = document.getElementById('mobile-gallery');
|
143 |
+
gallery.style.scrollSnapType = 'x mandatory';
|
144 |
+
gallery.style.overflowX = 'auto';
|
145 |
+
gallery.style.webkitOverflowScrolling = 'touch';
|
146 |
+
}
|
147 |
+
```
|
148 |
+
|
149 |
+
**Scope:** ~200 lines for mobile gallery implementation
|
150 |
+
|
151 |
+
### 6. Responsive Breakpoints
|
152 |
+
|
153 |
+
**Implement proper breakpoint system:**
|
154 |
+
```css
|
155 |
+
/* Mobile first approach */
|
156 |
+
/* Base: Mobile (< 640px) */
|
157 |
+
.container {
|
158 |
+
display: block;
|
159 |
+
padding: 1rem;
|
160 |
+
}
|
161 |
+
|
162 |
+
/* Tablet (640px - 1024px) */
|
163 |
+
@media (min-width: 640px) {
|
164 |
+
.container {
|
165 |
+
display: flex;
|
166 |
+
padding: 1.5rem;
|
167 |
+
}
|
168 |
+
}
|
169 |
+
|
170 |
+
/* Desktop (> 1024px) */
|
171 |
+
@media (min-width: 1024px) {
|
172 |
+
.container {
|
173 |
+
padding: 2rem;
|
174 |
+
}
|
175 |
+
}
|
176 |
+
```
|
177 |
+
|
178 |
+
### 7. Performance Optimizations
|
179 |
+
|
180 |
+
**Mobile-specific optimizations:**
|
181 |
+
- Lazy load images with Intersection Observer
|
182 |
+
- Reduce initial JavaScript bundle
|
183 |
+
- Implement virtual scrolling for large datasets
|
184 |
+
- Add `will-change` CSS for smooth animations
|
185 |
+
|
186 |
+
## Implementation Approach
|
187 |
+
|
188 |
+
### Phase 1: MVP (2 days)
|
189 |
+
1. Basic responsive layout
|
190 |
+
2. Touch navigation (swipe gestures)
|
191 |
+
3. Mobile-friendly buttons
|
192 |
+
4. Fix overflow issues
|
193 |
+
|
194 |
+
### Phase 2: Enhanced Mobile UX (2 days)
|
195 |
+
1. Mobile navigation bar
|
196 |
+
2. Touch-optimized dock
|
197 |
+
3. Page selector modal
|
198 |
+
4. Gesture refinements
|
199 |
+
|
200 |
+
### Phase 3: Polish (1 day)
|
201 |
+
1. Performance optimizations
|
202 |
+
2. PWA features
|
203 |
+
3. Cross-device testing
|
204 |
+
4. Documentation
|
205 |
+
|
206 |
+
## Testing Requirements
|
207 |
+
|
208 |
+
### Devices to Test
|
209 |
+
- **iOS:** iPhone SE, iPhone 12/13, iPad
|
210 |
+
- **Android:** Various screen sizes (5", 6", 7")
|
211 |
+
- **Browsers:** Safari iOS, Chrome Android, Firefox Mobile
|
212 |
+
|
213 |
+
### Key Test Scenarios
|
214 |
+
1. Portrait/landscape orientation changes
|
215 |
+
2. Touch gesture accuracy
|
216 |
+
3. Text readability at different zoom levels
|
217 |
+
4. Navigation button accessibility
|
218 |
+
5. Image loading performance on slow connections
|
219 |
+
|
220 |
+
## Code Impact Summary
|
221 |
+
|
222 |
+
| Component | Lines Changed | Complexity |
|
223 |
+
|-----------|--------------|------------|
|
224 |
+
| HTML Layout | 150-200 | Medium |
|
225 |
+
| CSS/Tailwind | 200-300 | Low-Medium |
|
226 |
+
| Touch Events | 150 | High |
|
227 |
+
| Mobile Navigation | 100 | Medium |
|
228 |
+
| Gallery/Dock | 200 | High |
|
229 |
+
| **Total** | **800-1,200** | **Medium-High** |
|
230 |
+
|
231 |
+
## Priority Recommendations
|
232 |
+
|
233 |
+
1. **Must Have:** Responsive layout, basic touch navigation
|
234 |
+
2. **Should Have:** Mobile navigation bar, touch-friendly buttons
|
235 |
+
3. **Nice to Have:** Gesture refinements, PWA features, animations
|
236 |
+
|
237 |
+
The most critical change is the layout restructuring - without this, other mobile features won't work properly. Start there and build up progressively.
|
multi-ocr-comparison-ui-patterns.md
ADDED
@@ -0,0 +1,277 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# Multi-OCR Engine Comparison UI Patterns
|
2 |
+
|
3 |
+
## Executive Summary
|
4 |
+
|
5 |
+
This document outlines UI design patterns for comparing the results of 5+ OCR engines in the OCR Time Capsule application. Based on research of existing comparison tools and UI best practices, we recommend a hybrid approach combining selective comparison, matrix views, and progressive disclosure.
|
6 |
+
|
7 |
+
## Key Design Constraints
|
8 |
+
|
9 |
+
1. **Human Cognitive Limits**: Users can effectively compare 3-7 items simultaneously
|
10 |
+
2. **Screen Real Estate**: Limited horizontal space for side-by-side comparisons
|
11 |
+
3. **Information Density**: Need to show both text content and metadata
|
12 |
+
4. **Performance**: Rendering 5+ full texts simultaneously can impact performance
|
13 |
+
|
14 |
+
## Recommended UI Patterns
|
15 |
+
|
16 |
+
### 1. Selective Comparison Mode (Primary Recommendation)
|
17 |
+
|
18 |
+
Allow users to select 2-4 engines for detailed comparison from a larger set.
|
19 |
+
|
20 |
+
```
|
21 |
+
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
22 |
+
β Select OCR Engines to Compare: β
|
23 |
+
β βββ Tesseract 5.0 βββ Google Vision βββ AWS Textract β
|
24 |
+
β βββ€ Azure AI βββ€ PaddleOCR βββ€ Surya OCR β
|
25 |
+
β βββ EasyOCR βββ TrOCR βββ RolmOCR β
|
26 |
+
β β
|
27 |
+
β [Compare Selected (3)] β
|
28 |
+
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
29 |
+
|
30 |
+
After selection:
|
31 |
+
βββββββββββ¬ββββββββββββββ¬ββββββββββββββ¬ββββββββββββββ
|
32 |
+
β Image β Tesseract β Google β AWS β
|
33 |
+
β Preview β 5.0 β Vision β Textract β
|
34 |
+
βββββββββββΌββββββββββββββΌββββββββββββββΌββββββββββββββ€
|
35 |
+
β β Text output β Text output β Text output β
|
36 |
+
β [IMG] β Lorem ipsum β Lorem ipsum β Lorem ipsum β
|
37 |
+
β β dolor sit β dolor sit β dolar sit β
|
38 |
+
β β amet... β amet... β amet... β
|
39 |
+
βββββββββββ΄ββββββββββββββ΄ββββββββββββββ΄ββββββββββββββ
|
40 |
+
```
|
41 |
+
|
42 |
+
**Advantages:**
|
43 |
+
- Maintains readable comparison
|
44 |
+
- User controls complexity
|
45 |
+
- Scalable to any number of engines
|
46 |
+
|
47 |
+
### 2. Matrix/Grid Overview
|
48 |
+
|
49 |
+
Show all results in a compact grid with expand/collapse functionality.
|
50 |
+
|
51 |
+
```
|
52 |
+
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
53 |
+
β OCR Engine Comparison Matrix β
|
54 |
+
ββββββββββββββ¬ββββββββββββ¬βββββββββββ¬ββββββββββ¬βββββββββ€
|
55 |
+
β Engine β Accuracy β Time(ms) β Preview β Action β
|
56 |
+
ββββββββββββββΌββββββββββββΌβββββββββββΌββββββββββΌβββββββββ€
|
57 |
+
β Tesseract β 94.2% β 1250 β Lorem...β [View] β
|
58 |
+
β Google β 98.1% β 320 β Lorem...β [View] β
|
59 |
+
β AWS β 97.5% β 410 β Lorem...β [View] β
|
60 |
+
β Azure β 96.8% β 380 β Lorem...β [View] β
|
61 |
+
β PaddleOCR β 95.3% β 890 β Lorem...β [View] β
|
62 |
+
β Surya β 93.7% β 1100 β Lorem...β [View] β
|
63 |
+
ββββββββββββββ΄ββββββββββββ΄βββββββββββ΄ββββββββββ΄βββββββββ
|
64 |
+
|
65 |
+
Click [View] to see full text in modal/sidebar
|
66 |
+
```
|
67 |
+
|
68 |
+
**Advantages:**
|
69 |
+
- Shows all engines at once
|
70 |
+
- Easy to scan metrics
|
71 |
+
- Detailed view on demand
|
72 |
+
|
73 |
+
### 3. Reference + Diff View
|
74 |
+
|
75 |
+
Select one OCR result as reference and show diffs from others.
|
76 |
+
|
77 |
+
```
|
78 |
+
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
79 |
+
β Reference: Google Vision OCR β
|
80 |
+
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
81 |
+
β β Lorem ipsum dolor sit amet, consectetur adipiscing ββ
|
82 |
+
β β elit, sed do eiusmod tempor incididunt ut labore ββ
|
83 |
+
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
84 |
+
β β
|
85 |
+
β Differences from Reference: β
|
86 |
+
β βββββββββββββββ¬βββββββββββββββββββββββββββββββββββββββββ
|
87 |
+
β β Tesseract β -dolor +dolar (char 12) ββ
|
88 |
+
β β β -adipiscing +adipiscing (char 38) ββ
|
89 |
+
β βββββββββββββββΌββββββββββββββββββββββββββββββββββββββββ€β
|
90 |
+
β β AWS β -consectetur +consektetur (char 27) ββ
|
91 |
+
β βββββββββββββββΌββββββββββββββββββββββββββββββββββββββββ€β
|
92 |
+
β β Azure β No differences ββ
|
93 |
+
β βββββββββββββββ΄βββββββββββββββββββββββββββββββββββββββββ
|
94 |
+
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
95 |
+
```
|
96 |
+
|
97 |
+
**Advantages:**
|
98 |
+
- Reduces visual complexity
|
99 |
+
- Easy to see variations
|
100 |
+
- Good for finding consensus
|
101 |
+
|
102 |
+
### 4. Accordion/Tab Hybrid
|
103 |
+
|
104 |
+
Combine tabs for primary views with accordions for details.
|
105 |
+
|
106 |
+
```
|
107 |
+
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
108 |
+
β [Overview] [Side-by-Side] [Consensus] [Analytics] β
|
109 |
+
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
|
110 |
+
β Overview Tab: β
|
111 |
+
β β
|
112 |
+
β βΌ Tesseract 5.0 (94.2% accuracy) β
|
113 |
+
β Lorem ipsum dolor sit amet... β
|
114 |
+
β [Show full text] [Compare with others] β
|
115 |
+
β β
|
116 |
+
β βΆ Google Vision (98.1% accuracy) β
|
117 |
+
β βΆ AWS Textract (97.5% accuracy) β
|
118 |
+
β βΆ Azure AI (96.8% accuracy) β
|
119 |
+
β βΆ PaddleOCR (95.3% accuracy) β
|
120 |
+
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
121 |
+
```
|
122 |
+
|
123 |
+
**Advantages:**
|
124 |
+
- Progressive disclosure
|
125 |
+
- Maintains context
|
126 |
+
- Flexible navigation
|
127 |
+
|
128 |
+
### 5. Consensus/Voting View
|
129 |
+
|
130 |
+
Show agreement levels between engines.
|
131 |
+
|
132 |
+
```
|
133 |
+
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
134 |
+
β Consensus View - 6 OCR Engines β
|
135 |
+
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
|
136 |
+
β Lorem ipsum βββββ sit amet, ββββββββββββ adipiscing β
|
137 |
+
β ^^^^^ ^^^^^^^^^^^^ β
|
138 |
+
β 5/6 agree 6/6 agree (consensus) β
|
139 |
+
β β
|
140 |
+
β Disagreements: β
|
141 |
+
β Position 12-16: "dolor" β
|
142 |
+
β - Tesseract: "dolar" (1 vote) β
|
143 |
+
β - Others: "dolor" (5 votes) β β
|
144 |
+
β β
|
145 |
+
β Position 27-38: "consectetur" β
|
146 |
+
β - AWS: "consektetur" (1 vote) β
|
147 |
+
β - Others: "consectetur" (5 votes) β β
|
148 |
+
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
149 |
+
```
|
150 |
+
|
151 |
+
**Advantages:**
|
152 |
+
- Shows confidence levels
|
153 |
+
- Identifies problem areas
|
154 |
+
- Good for quality assessment
|
155 |
+
|
156 |
+
### 6. Layered Comparison
|
157 |
+
|
158 |
+
Stack results with transparency/overlay controls.
|
159 |
+
|
160 |
+
```
|
161 |
+
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
162 |
+
β Layer Controls: β Opacity Visible β
|
163 |
+
β βββββββββββββββββββββββββββββββββββββββββββββ¬βββββββββ€β
|
164 |
+
β β ββ βββββββββ β β ββ
|
165 |
+
β β [Overlaid Text View] ββ Tesseract β ββ
|
166 |
+
β β ββββββββββββββΌβββββββββ€β
|
167 |
+
β β Multiple colored layers ββ βββββββββ β β ββ
|
168 |
+
β β showing differences ββ Google β ββ
|
169 |
+
β β ββββββββββββββΌβββββββββ€β
|
170 |
+
β β ββ βββββββββ β β ββ
|
171 |
+
β β ββ AWS β ββ
|
172 |
+
β βββββββββββββββββββββββββββββββββββββββββββββ΄ββββββββββ
|
173 |
+
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
174 |
+
```
|
175 |
+
|
176 |
+
**Advantages:**
|
177 |
+
- Visual diff representation
|
178 |
+
- Adjustable comparison
|
179 |
+
- Good for alignment issues
|
180 |
+
|
181 |
+
## Metadata Display Patterns
|
182 |
+
|
183 |
+
### Inline Badges
|
184 |
+
```
|
185 |
+
βββββββββββββββββββββββββββββββββββββββββββ
|
186 |
+
β Tesseract 5.0 [94.2%] [1.2s] [MIT] β
|
187 |
+
β Lorem ipsum dolor sit amet... β
|
188 |
+
βββββββββββββββββββββββββββββββββββββββββββ
|
189 |
+
```
|
190 |
+
|
191 |
+
### Hover Cards
|
192 |
+
```
|
193 |
+
βββββββββββββββββββββββββββββββββββββββββββ
|
194 |
+
β Google Vision β β
|
195 |
+
β βββββββββββββββββββββββ β
|
196 |
+
β β Accuracy: 98.1% β (on hover) β
|
197 |
+
β β Time: 320ms β β
|
198 |
+
β β Cost: $0.0015 β β
|
199 |
+
β β Language: Multi β β
|
200 |
+
β βββββββββββββββββββββββ β
|
201 |
+
βββββββββββββββββββββββββββββββββββββββββββ
|
202 |
+
```
|
203 |
+
|
204 |
+
## Navigation Patterns
|
205 |
+
|
206 |
+
### 1. Engine Selector Bar
|
207 |
+
```
|
208 |
+
[All] [High Accuracy] [Fast] [Open Source] [Custom Group]
|
209 |
+
```
|
210 |
+
|
211 |
+
### 2. Quick Switch
|
212 |
+
```
|
213 |
+
Previous Engine [Tesseract βΌ] Next Engine
|
214 |
+
Google Vision
|
215 |
+
AWS Textract
|
216 |
+
Azure AI
|
217 |
+
```
|
218 |
+
|
219 |
+
### 3. Comparison History
|
220 |
+
```
|
221 |
+
Recent Comparisons:
|
222 |
+
β’ Tesseract vs Google vs AWS (2 min ago)
|
223 |
+
β’ All engines - Page 15 (5 min ago)
|
224 |
+
β’ Azure vs PaddleOCR (10 min ago)
|
225 |
+
```
|
226 |
+
|
227 |
+
## Mobile Considerations
|
228 |
+
|
229 |
+
For mobile devices, use a stacked card approach:
|
230 |
+
|
231 |
+
```
|
232 |
+
βββββββββββββββββββ
|
233 |
+
β Original Image β
|
234 |
+
βββββββββββββββββββ€
|
235 |
+
β Tesseract 94.2% β
|
236 |
+
β βΌ Show text β
|
237 |
+
βββββββββββββββββββ€
|
238 |
+
β Google 98.1% β
|
239 |
+
β βΆ Show text β
|
240 |
+
βββββββββββββββββββ€
|
241 |
+
β AWS 97.5% β
|
242 |
+
β βΆ Show text β
|
243 |
+
βββββββββββββββββββ
|
244 |
+
```
|
245 |
+
|
246 |
+
## Performance Optimizations
|
247 |
+
|
248 |
+
1. **Lazy Loading**: Only load full text when expanded/selected
|
249 |
+
2. **Virtual Scrolling**: For long documents
|
250 |
+
3. **Caching**: Store OCR results client-side
|
251 |
+
4. **Progressive Enhancement**: Start with 2-3 engines, load more on demand
|
252 |
+
|
253 |
+
## Recommended Implementation Priority
|
254 |
+
|
255 |
+
1. **Phase 1**: Selective Comparison (2-4 engines)
|
256 |
+
2. **Phase 2**: Matrix Overview with metrics
|
257 |
+
3. **Phase 3**: Consensus/Voting view
|
258 |
+
4. **Phase 4**: Advanced features (layers, history, etc.)
|
259 |
+
|
260 |
+
## Accessibility Considerations
|
261 |
+
|
262 |
+
- Keyboard navigation between engines
|
263 |
+
- Screen reader announcements for differences
|
264 |
+
- High contrast mode for diff highlighting
|
265 |
+
- Alternative text descriptions for visual comparisons
|
266 |
+
|
267 |
+
## Conclusion
|
268 |
+
|
269 |
+
The selective comparison pattern combined with a matrix overview provides the best balance of usability and functionality for comparing 5+ OCR engines. This approach:
|
270 |
+
|
271 |
+
- Respects cognitive limits (3-7 items)
|
272 |
+
- Provides overview and detail views
|
273 |
+
- Scales to any number of engines
|
274 |
+
- Maintains performance
|
275 |
+
- Works on mobile devices
|
276 |
+
|
277 |
+
The key is progressive disclosure: show summary information for all engines, but limit detailed comparison to user-selected subsets.
|