Commits · seanpedrickcase/document

Fixed issues with gradio version 5.16. Fixed fuzzy search error with pages with no data.

3cecbfa

seanpedrickcase commited on Feb 13

Fuzzy match implementation for deny list. Added option to merge multiple review files. Review files from redaction step should now include text.

bde6e5b

seanpedrickcase commited on Jan 27

Ensured the text ocr outputs have no line breaks at end. Multi-line custom text searches now possible. Files for review sent from redact button. Fixed image redaction (not review yet). Can get user pool details from headers. Gradio update.

cb349ad

seanpedrickcase commited on Jan 21

Greatly improved regex for direct matching with custom entities

6ac4be4

seanpedrickcase commited on Jan 14

Uploaded pdfs with review files will now include all pages that don't have redactions. Slightly improved deny list matching.

613b1b4

seanpedrickcase commited on Jan 14

Refactor redaction functionality and enhance UI components: Added support for custom recognizers and whole page redaction options. Updated file handling to include new dropdowns for entity selection and improved dataframes for entity management. Enhanced the annotator with better state management and UI responsiveness. Cleaned up redundant code and improved overall performance in the redaction process.

1d772de

seanpedrickcase commited on Dec 24, 2024

Updated packages. Reinstituted multithreading with page load, now with order protected. Smaller spacy model used for speed. Textract calls should now be faster

f0c28d7

seanpedrickcase commited on Dec 19, 2024

Started adding in support for custom deny list. Fixed textract call issue. Removed multithreading for now as it mixes up pages

e3365ed

seanpedrickcase commited on Dec 17, 2024

Comprehend now uses custom spacy recognisers on top of defaults. Added zoom functionality to annotator. Fixed some pdf mediabox issues and redacted image output issues.

ec98119

seanpedrickcase commited on Nov 8, 2024

Allowed for time limits on redact to avoid timeouts. Improved review interface. Now accepts only one file at a time. Upgraded Gradio version

eea5c07

seanpedrickcase commited on Nov 5, 2024

Redaction tool can now export pdfs with selectable text retained - redacted text is deleted and covered with a black box. Licence change for pymupdf use.

339a165

seanpedrickcase commited on Sep 27, 2024

Generally improved OCR recognition of texts, corrected postcode regex

a748df6

seanpedrickcase commited on Sep 24, 2024

Optimised Textract and Tesseract workings

8652429

seanpedrickcase commited on Sep 24, 2024

Improved allow list, handwriting/signature identification, logging

6ea0852

seanpedrickcase commited on Sep 19, 2024

Version 0.1. Adapted code for pyinstaller local executable conversion (Windows)

2a4b347

seanpedrickcase commited on May 22, 2024

Initial commit

641ff3e

seanpedrickcase commited on Apr 25, 2024

Spaces:

seanpedrickcase
/

document_redaction

Sleeping

Commit History

Fixed issues with gradio version 5.16. Fixed fuzzy search error with pages with no data.

3cecbfa

Fuzzy match implementation for deny list. Added option to merge multiple review files. Review files from redaction step should now include text.

bde6e5b

Ensured the text ocr outputs have no line breaks at end. Multi-line custom text searches now possible. Files for review sent from redact button. Fixed image redaction (not review yet). Can get user pool details from headers. Gradio update.

cb349ad

Greatly improved regex for direct matching with custom entities

6ac4be4

Uploaded pdfs with review files will now include all pages that don't have redactions. Slightly improved deny list matching.

613b1b4

Updated packages. Reinstituted multithreading with page load, now with order protected. Smaller spacy model used for speed. Textract calls should now be faster

f0c28d7

Started adding in support for custom deny list. Fixed textract call issue. Removed multithreading for now as it mixes up pages

e3365ed

Comprehend now uses custom spacy recognisers on top of defaults. Added zoom functionality to annotator. Fixed some pdf mediabox issues and redacted image output issues.

ec98119

Allowed for time limits on redact to avoid timeouts. Improved review interface. Now accepts only one file at a time. Upgraded Gradio version

eea5c07

Redaction tool can now export pdfs with selectable text retained - redacted text is deleted and covered with a black box. Licence change for pymupdf use.

339a165

Generally improved OCR recognition of texts, corrected postcode regex

a748df6

Optimised Textract and Tesseract workings

8652429

Improved allow list, handwriting/signature identification, logging

6ea0852

Version 0.1. Adapted code for pyinstaller local executable conversion (Windows)

2a4b347

Initial commit

641ff3e

Commit History

Fixed issues with gradio version 5.16. Fixed fuzzy search error with pages with no data. 3cecbfa

Fuzzy match implementation for deny list. Added option to merge multiple review files. Review files from redaction step should now include text. bde6e5b

Ensured the text ocr outputs have no line breaks at end. Multi-line custom text searches now possible. Files for review sent from redact button. Fixed image redaction (not review yet). Can get user pool details from headers. Gradio update. cb349ad

Greatly improved regex for direct matching with custom entities 6ac4be4

Uploaded pdfs with review files will now include all pages that don't have redactions. Slightly improved deny list matching. 613b1b4

Updated packages. Reinstituted multithreading with page load, now with order protected. Smaller spacy model used for speed. Textract calls should now be faster f0c28d7

Started adding in support for custom deny list. Fixed textract call issue. Removed multithreading for now as it mixes up pages e3365ed

Comprehend now uses custom spacy recognisers on top of defaults. Added zoom functionality to annotator. Fixed some pdf mediabox issues and redacted image output issues. ec98119

Allowed for time limits on redact to avoid timeouts. Improved review interface. Now accepts only one file at a time. Upgraded Gradio version eea5c07

Redaction tool can now export pdfs with selectable text retained - redacted text is deleted and covered with a black box. Licence change for pymupdf use. 339a165

Generally improved OCR recognition of texts, corrected postcode regex a748df6

Optimised Textract and Tesseract workings 8652429

Improved allow list, handwriting/signature identification, logging 6ea0852

Version 0.1. Adapted code for pyinstaller local executable conversion (Windows) 2a4b347

Initial commit 641ff3e

Fixed issues with gradio version 5.16. Fixed fuzzy search error with pages with no data.

3cecbfa

Fuzzy match implementation for deny list. Added option to merge multiple review files. Review files from redaction step should now include text.

bde6e5b

Ensured the text ocr outputs have no line breaks at end. Multi-line custom text searches now possible. Files for review sent from redact button. Fixed image redaction (not review yet). Can get user pool details from headers. Gradio update.

cb349ad

Greatly improved regex for direct matching with custom entities

6ac4be4

Uploaded pdfs with review files will now include all pages that don't have redactions. Slightly improved deny list matching.

613b1b4

Updated packages. Reinstituted multithreading with page load, now with order protected. Smaller spacy model used for speed. Textract calls should now be faster

f0c28d7

Started adding in support for custom deny list. Fixed textract call issue. Removed multithreading for now as it mixes up pages

e3365ed

Comprehend now uses custom spacy recognisers on top of defaults. Added zoom functionality to annotator. Fixed some pdf mediabox issues and redacted image output issues.

ec98119

Allowed for time limits on redact to avoid timeouts. Improved review interface. Now accepts only one file at a time. Upgraded Gradio version

eea5c07

Redaction tool can now export pdfs with selectable text retained - redacted text is deleted and covered with a black box. Licence change for pymupdf use.

339a165

Generally improved OCR recognition of texts, corrected postcode regex

a748df6

Optimised Textract and Tesseract workings

8652429

Improved allow list, handwriting/signature identification, logging

6ea0852

Version 0.1. Adapted code for pyinstaller local executable conversion (Windows)

2a4b347

Initial commit

641ff3e