document_redaction / tools /helper_functions.py

Commit History

Fuzzy match implementation for deny list. Added option to merge multiple review files. Review files from redaction step should now include text.
bde6e5b

seanpedrickcase commited on

Added capabilities to export to and import from Adobe .xfdf files
6b28cfa

seanpedrickcase commited on

Added tab to be able to compare pages across multiple documents and redact duplicates
a265560

seanpedrickcase commited on

Ensured the text ocr outputs have no line breaks at end. Multi-line custom text searches now possible. Files for review sent from redact button. Fixed image redaction (not review yet). Can get user pool details from headers. Gradio update.
cb349ad

seanpedrickcase commited on

App should now resize images that are too large before sending to Textract. Textract now more robust to failure. Improved reliability of json conversion to review dataframe
143e2cc

seanpedrickcase commited on

Refactor redaction functionality and enhance UI components: Added support for custom recognizers and whole page redaction options. Updated file handling to include new dropdowns for entity selection and improved dataframes for entity management. Enhanced the annotator with better state management and UI responsiveness. Cleaned up redundant code and improved overall performance in the redaction process.
1d772de

seanpedrickcase commited on

Enhance file handling and UI features: improved Gradio app layout with fill width option, and integrated new settings for deny, and fully redacted lists (placeholders so far). Updated file conversion functions to handle CSV inputs and added CSV review file generation for redactions. Now retains all original and merged redaction boxes.
a770956

seanpedrickcase commited on

Updated packages. Reinstituted multithreading with page load, now with order protected. Smaller spacy model used for speed. Textract calls should now be faster
f0c28d7

seanpedrickcase commited on

Removed default custom header values so as not to cause errors
7f5a542

seanpedrickcase commited on

Modified Dockerfile hopefully to not need Lambda overrides. Looking into custom headers from Cloudfront to try to get them to work
bf7bb79

seanpedrickcase commited on

Only shows AWS options when AWS functions enabled. Can now upload previous review files to continue review later. Some review debugging.
e2aae24

seanpedrickcase commited on

Comprehend now uses custom spacy recognisers on top of defaults. Added zoom functionality to annotator. Fixed some pdf mediabox issues and redacted image output issues.
ec98119

seanpedrickcase commited on

When on AWS, now loads in a default allow_list to exclude common words from redaction. Improved checks on AWS Comprehend calls.
390bef2

seanpedrickcase commited on

Allowed for time limits on redact to avoid timeouts. Improved review interface. Now accepts only one file at a time. Upgraded Gradio version
eea5c07

seanpedrickcase commited on

Improved allow list, handwriting/signature identification, logging
6ea0852

seanpedrickcase commited on

Updated time sum function to sum correctly
e1c402a

seanpedrickcase commited on

Handles multiple runs with multiple files correctly now. Logging and feedback improvements.
bbf818d

seanpedrickcase commited on

Decision process now saved as log files. Other log files and feedback added
8c33828

seanpedrickcase commited on

Added logging, anonymising all Excel sheets, simple redaction tags, some Dockerfile optimisation
01c88c0

seanpedrickcase commited on

Minor bug fix to connection parameter function
275c820

seanpedrickcase commited on

Added possibility to do authentication with AWS Cognito on load. Other minor changes.
bc22fc4

seanpedrickcase commited on

Can now redaction text or csv/xlsx files. Can redact multiple files. Embeds redactions as image-based file by default
7810536

seanpedrickcase commited on

Better redaction output formatting. Custom output folders allowed. Upgraded Gradio version
12224f5

seanpedrickcase commited on

Version 0.1. Adapted code for pyinstaller local executable conversion (Windows)
2a4b347

seanpedrickcase commited on