Commits · seanpedrickcase/document

Now local OCR outputs can be saved to file and reloaded to save preparation time. Bug fixing in logs and tabular data redaction. Update to documentation

f93e49c

seanpedrickcase commited on Apr 28

Improved logging format a little. Now possible to save logs to DynamoDB

0042e78

seanpedrickcase commited on Apr 27

More config options. Fixed some bugs with removing elements from review page and Adobe export. Some UI rearrangements

6319afc

seanpedrickcase commited on Mar 24

Integrated AWS Comprehend and fuzzy matching functions with tabular data redaction.

ff290e1

seanpedrickcase commited on Mar 5

Allowed for Textract and Comprehend API calls through AWS keys. File preparation function incorporated into main redaction function to avoid needing user to 'check in' during redaction process

391712c

seanpedrickcase commited on Feb 25

Fuzzy match implementation for deny list. Added option to merge multiple review files. Review files from redaction step should now include text.

bde6e5b

seanpedrickcase commited on Jan 27

Ensured the text ocr outputs have no line breaks at end. Multi-line custom text searches now possible. Files for review sent from redact button. Fixed image redaction (not review yet). Can get user pool details from headers. Gradio update.

cb349ad

seanpedrickcase commited on Jan 21

App should now resize images that are too large before sending to Textract. Textract now more robust to failure. Improved reliability of json conversion to review dataframe

143e2cc

seanpedrickcase commited on Jan 15

Greatly improved regex for direct matching with custom entities

6ac4be4

seanpedrickcase commited on Jan 14

Started adding in support for custom deny list. Fixed textract call issue. Removed multithreading for now as it mixes up pages

e3365ed

seanpedrickcase commited on Dec 17, 2024

Only shows AWS options when AWS functions enabled. Can now upload previous review files to continue review later. Some review debugging.

e2aae24

seanpedrickcase commited on Nov 18, 2024

Comprehend now uses custom spacy recognisers on top of defaults. Added zoom functionality to annotator. Fixed some pdf mediabox issues and redacted image output issues.

ec98119

seanpedrickcase commited on Nov 8, 2024

Consolidated AWS Comprehend redaction calls to reduce total number

542c252

seanpedrickcase commited on Nov 6, 2024

When on AWS, now loads in a default allow_list to exclude common words from redaction. Improved checks on AWS Comprehend calls.

390bef2

seanpedrickcase commited on Nov 6, 2024

Improved logging

8235bbb

seanpedrickcase commited on Nov 6, 2024

Added support for AWS Comprehend for PII identification. OCR and detection results now written to main output

f0f9378

seanpedrickcase commited on Nov 5, 2024

Allowed for time limits on redact to avoid timeouts. Improved review interface. Now accepts only one file at a time. Upgraded Gradio version

eea5c07

seanpedrickcase commited on Nov 5, 2024

Redaction tool can now export pdfs with selectable text retained - redacted text is deleted and covered with a black box. Licence change for pymupdf use.

339a165

seanpedrickcase commited on Sep 27, 2024

General improvement in quick image matching and merging

84c83c0

seanpedrickcase commited on Sep 26, 2024

Generally improved OCR recognition of texts, corrected postcode regex

a748df6

seanpedrickcase commited on Sep 24, 2024

Optimised Textract and Tesseract workings

8652429

seanpedrickcase commited on Sep 24, 2024

Improved allow list, handwriting/signature identification, logging

6ea0852

seanpedrickcase commited on Sep 19, 2024

Added AWS Textract support. Allowed for OCR logs export.

e9c4101

seanpedrickcase commited on Sep 18, 2024

Spaces:

seanpedrickcase
/

document_redaction

Running

Commit History

Now local OCR outputs can be saved to file and reloaded to save preparation time. Bug fixing in logs and tabular data redaction. Update to documentation

f93e49c

Improved logging format a little. Now possible to save logs to DynamoDB

0042e78

More config options. Fixed some bugs with removing elements from review page and Adobe export. Some UI rearrangements

6319afc

Integrated AWS Comprehend and fuzzy matching functions with tabular data redaction.

ff290e1

Allowed for Textract and Comprehend API calls through AWS keys. File preparation function incorporated into main redaction function to avoid needing user to 'check in' during redaction process

391712c

Fuzzy match implementation for deny list. Added option to merge multiple review files. Review files from redaction step should now include text.

bde6e5b

Ensured the text ocr outputs have no line breaks at end. Multi-line custom text searches now possible. Files for review sent from redact button. Fixed image redaction (not review yet). Can get user pool details from headers. Gradio update.

cb349ad

App should now resize images that are too large before sending to Textract. Textract now more robust to failure. Improved reliability of json conversion to review dataframe

143e2cc

Greatly improved regex for direct matching with custom entities

6ac4be4

Started adding in support for custom deny list. Fixed textract call issue. Removed multithreading for now as it mixes up pages

e3365ed

Only shows AWS options when AWS functions enabled. Can now upload previous review files to continue review later. Some review debugging.

e2aae24

Comprehend now uses custom spacy recognisers on top of defaults. Added zoom functionality to annotator. Fixed some pdf mediabox issues and redacted image output issues.

ec98119

Consolidated AWS Comprehend redaction calls to reduce total number

542c252

When on AWS, now loads in a default allow_list to exclude common words from redaction. Improved checks on AWS Comprehend calls.

390bef2

Improved logging

8235bbb

Added support for AWS Comprehend for PII identification. OCR and detection results now written to main output

f0f9378

Allowed for time limits on redact to avoid timeouts. Improved review interface. Now accepts only one file at a time. Upgraded Gradio version

eea5c07

Redaction tool can now export pdfs with selectable text retained - redacted text is deleted and covered with a black box. Licence change for pymupdf use.

339a165

General improvement in quick image matching and merging

84c83c0

Generally improved OCR recognition of texts, corrected postcode regex

a748df6

Optimised Textract and Tesseract workings

8652429

Improved allow list, handwriting/signature identification, logging

6ea0852

Added AWS Textract support. Allowed for OCR logs export.

e9c4101

Commit History

Now local OCR outputs can be saved to file and reloaded to save preparation time. Bug fixing in logs and tabular data redaction. Update to documentation f93e49c

Improved logging format a little. Now possible to save logs to DynamoDB 0042e78

More config options. Fixed some bugs with removing elements from review page and Adobe export. Some UI rearrangements 6319afc

Integrated AWS Comprehend and fuzzy matching functions with tabular data redaction. ff290e1

Allowed for Textract and Comprehend API calls through AWS keys. File preparation function incorporated into main redaction function to avoid needing user to 'check in' during redaction process 391712c

Fuzzy match implementation for deny list. Added option to merge multiple review files. Review files from redaction step should now include text. bde6e5b

Ensured the text ocr outputs have no line breaks at end. Multi-line custom text searches now possible. Files for review sent from redact button. Fixed image redaction (not review yet). Can get user pool details from headers. Gradio update. cb349ad

App should now resize images that are too large before sending to Textract. Textract now more robust to failure. Improved reliability of json conversion to review dataframe 143e2cc

Greatly improved regex for direct matching with custom entities 6ac4be4

Started adding in support for custom deny list. Fixed textract call issue. Removed multithreading for now as it mixes up pages e3365ed

Only shows AWS options when AWS functions enabled. Can now upload previous review files to continue review later. Some review debugging. e2aae24

Comprehend now uses custom spacy recognisers on top of defaults. Added zoom functionality to annotator. Fixed some pdf mediabox issues and redacted image output issues. ec98119

Consolidated AWS Comprehend redaction calls to reduce total number 542c252

When on AWS, now loads in a default allow_list to exclude common words from redaction. Improved checks on AWS Comprehend calls. 390bef2

Improved logging 8235bbb

Added support for AWS Comprehend for PII identification. OCR and detection results now written to main output f0f9378

Allowed for time limits on redact to avoid timeouts. Improved review interface. Now accepts only one file at a time. Upgraded Gradio version eea5c07

Redaction tool can now export pdfs with selectable text retained - redacted text is deleted and covered with a black box. Licence change for pymupdf use. 339a165

General improvement in quick image matching and merging 84c83c0

Generally improved OCR recognition of texts, corrected postcode regex a748df6

Optimised Textract and Tesseract workings 8652429

Improved allow list, handwriting/signature identification, logging 6ea0852

Added AWS Textract support. Allowed for OCR logs export. e9c4101

Now local OCR outputs can be saved to file and reloaded to save preparation time. Bug fixing in logs and tabular data redaction. Update to documentation

f93e49c

Improved logging format a little. Now possible to save logs to DynamoDB

0042e78

More config options. Fixed some bugs with removing elements from review page and Adobe export. Some UI rearrangements

6319afc

Integrated AWS Comprehend and fuzzy matching functions with tabular data redaction.

ff290e1

Allowed for Textract and Comprehend API calls through AWS keys. File preparation function incorporated into main redaction function to avoid needing user to 'check in' during redaction process

391712c

Fuzzy match implementation for deny list. Added option to merge multiple review files. Review files from redaction step should now include text.

bde6e5b

Ensured the text ocr outputs have no line breaks at end. Multi-line custom text searches now possible. Files for review sent from redact button. Fixed image redaction (not review yet). Can get user pool details from headers. Gradio update.

cb349ad

App should now resize images that are too large before sending to Textract. Textract now more robust to failure. Improved reliability of json conversion to review dataframe

143e2cc

Greatly improved regex for direct matching with custom entities

6ac4be4

Started adding in support for custom deny list. Fixed textract call issue. Removed multithreading for now as it mixes up pages

e3365ed

Only shows AWS options when AWS functions enabled. Can now upload previous review files to continue review later. Some review debugging.

e2aae24

Comprehend now uses custom spacy recognisers on top of defaults. Added zoom functionality to annotator. Fixed some pdf mediabox issues and redacted image output issues.

ec98119

Consolidated AWS Comprehend redaction calls to reduce total number

542c252

When on AWS, now loads in a default allow_list to exclude common words from redaction. Improved checks on AWS Comprehend calls.

390bef2

Improved logging

8235bbb

Added support for AWS Comprehend for PII identification. OCR and detection results now written to main output

f0f9378

Allowed for time limits on redact to avoid timeouts. Improved review interface. Now accepts only one file at a time. Upgraded Gradio version

eea5c07

Redaction tool can now export pdfs with selectable text retained - redacted text is deleted and covered with a black box. Licence change for pymupdf use.

339a165

General improvement in quick image matching and merging

84c83c0

Generally improved OCR recognition of texts, corrected postcode regex

a748df6

Optimised Textract and Tesseract workings

8652429

Improved allow list, handwriting/signature identification, logging

6ea0852

Added AWS Textract support. Allowed for OCR logs export.

e9c4101