document_redaction / tools /load_spacy_model_custom_recognisers.py

Commit History

Fixed issues with gradio version 5.16. Fixed fuzzy search error with pages with no data.
3cecbfa

seanpedrickcase commited on

Fuzzy match implementation for deny list. Added option to merge multiple review files. Review files from redaction step should now include text.
bde6e5b

seanpedrickcase commited on

Ensured the text ocr outputs have no line breaks at end. Multi-line custom text searches now possible. Files for review sent from redact button. Fixed image redaction (not review yet). Can get user pool details from headers. Gradio update.
cb349ad

seanpedrickcase commited on

Greatly improved regex for direct matching with custom entities
6ac4be4

seanpedrickcase commited on

Uploaded pdfs with review files will now include all pages that don't have redactions. Slightly improved deny list matching.
613b1b4

seanpedrickcase commited on

Refactor redaction functionality and enhance UI components: Added support for custom recognizers and whole page redaction options. Updated file handling to include new dropdowns for entity selection and improved dataframes for entity management. Enhanced the annotator with better state management and UI responsiveness. Cleaned up redundant code and improved overall performance in the redaction process.
1d772de

seanpedrickcase commited on

Updated packages. Reinstituted multithreading with page load, now with order protected. Smaller spacy model used for speed. Textract calls should now be faster
f0c28d7

seanpedrickcase commited on

Started adding in support for custom deny list. Fixed textract call issue. Removed multithreading for now as it mixes up pages
e3365ed

seanpedrickcase commited on

Comprehend now uses custom spacy recognisers on top of defaults. Added zoom functionality to annotator. Fixed some pdf mediabox issues and redacted image output issues.
ec98119

seanpedrickcase commited on

Allowed for time limits on redact to avoid timeouts. Improved review interface. Now accepts only one file at a time. Upgraded Gradio version
eea5c07

seanpedrickcase commited on

Redaction tool can now export pdfs with selectable text retained - redacted text is deleted and covered with a black box. Licence change for pymupdf use.
339a165

seanpedrickcase commited on

Generally improved OCR recognition of texts, corrected postcode regex
a748df6

seanpedrickcase commited on

Optimised Textract and Tesseract workings
8652429

seanpedrickcase commited on

Improved allow list, handwriting/signature identification, logging
6ea0852

seanpedrickcase commited on

Version 0.1. Adapted code for pyinstaller local executable conversion (Windows)
2a4b347

seanpedrickcase commited on