document_redaction / tools /custom_image_analyser_engine.py

Commit History

Allowed for Textract and Comprehend API calls through AWS keys. File preparation function incorporated into main redaction function to avoid needing user to 'check in' during redaction process
391712c

seanpedrickcase commited on

Fuzzy match implementation for deny list. Added option to merge multiple review files. Review files from redaction step should now include text.
bde6e5b

seanpedrickcase commited on

Ensured the text ocr outputs have no line breaks at end. Multi-line custom text searches now possible. Files for review sent from redact button. Fixed image redaction (not review yet). Can get user pool details from headers. Gradio update.
cb349ad

seanpedrickcase commited on

App should now resize images that are too large before sending to Textract. Textract now more robust to failure. Improved reliability of json conversion to review dataframe
143e2cc

seanpedrickcase commited on

Greatly improved regex for direct matching with custom entities
6ac4be4

seanpedrickcase commited on

Started adding in support for custom deny list. Fixed textract call issue. Removed multithreading for now as it mixes up pages
e3365ed

seanpedrickcase commited on

Only shows AWS options when AWS functions enabled. Can now upload previous review files to continue review later. Some review debugging.
e2aae24

seanpedrickcase commited on

Comprehend now uses custom spacy recognisers on top of defaults. Added zoom functionality to annotator. Fixed some pdf mediabox issues and redacted image output issues.
ec98119

seanpedrickcase commited on

Consolidated AWS Comprehend redaction calls to reduce total number
542c252

seanpedrickcase commited on

When on AWS, now loads in a default allow_list to exclude common words from redaction. Improved checks on AWS Comprehend calls.
390bef2

seanpedrickcase commited on

Added support for AWS Comprehend for PII identification. OCR and detection results now written to main output
f0f9378

seanpedrickcase commited on

Allowed for time limits on redact to avoid timeouts. Improved review interface. Now accepts only one file at a time. Upgraded Gradio version
eea5c07

seanpedrickcase commited on

Redaction tool can now export pdfs with selectable text retained - redacted text is deleted and covered with a black box. Licence change for pymupdf use.
339a165

seanpedrickcase commited on

General improvement in quick image matching and merging
84c83c0

seanpedrickcase commited on

Generally improved OCR recognition of texts, corrected postcode regex
a748df6

seanpedrickcase commited on

Optimised Textract and Tesseract workings
8652429

seanpedrickcase commited on

Improved allow list, handwriting/signature identification, logging
6ea0852

seanpedrickcase commited on

Added AWS Textract support. Allowed for OCR logs export.
e9c4101

seanpedrickcase commited on