MattStammers commited on
Commit
99bb014
·
1 Parent(s): 8ce1c03

readme fixed

Browse files
Files changed (1) hide show
  1. app.py +3 -2
app.py CHANGED
@@ -322,6 +322,7 @@ The full results of the tool are given below in <i>Table 1</i> below.
322
  | Precision | 0.91 | 0.93 | 0.28 |
323
  | Recall | 0.94 | 0.16 | 0.49 |
324
  | F1 Score | 0.93 | 0.28 | 0.36 |
 
325
  <small><i>Table 1: Summary of Model Performance Metrics</i></small>
326
 
327
  ### Strengths
@@ -329,12 +330,12 @@ The full results of the tool are given below in <i>Table 1</i> below.
329
 
330
  - [The Stanford De-Identifier Base Model](https://huggingface.co/StanfordAIMI/stanford-deidentifier-base)[1] is 99% accurate on our test set of radiology reports and achieves an F1 score of 93% on our challenging open-source benchmark. The others models are really to demonstrate the potential of Pteredactyl to deploy any transfomer model.
331
 
332
- - We have submitted the code to [OHDSI](https://www.ohdsi.org/) as an abstract and aim strongly to incorporate this into a wider open-source effort to solve intractable clinical informatics problems.
333
 
334
  ### Limitations
335
  - The tool was not designed initially to redact clinic letters as it was developed primarily on radiology reports in the US. We have made some augmentations to cover elements like postcodes using checksums but these might not always work. The same is true of NHS numbers as illustrated above.
336
 
337
- - It may overly aggressively redact text because it was built as a research tool where precision is prized > recall. However, in our experience this is uncommon enough that it is still very useful.
338
 
339
  - This is very much a research tool and should not be relied upon as a catch-all in any production-type capacity. The app makes the limitations very transparently obvious via the attached confusion matrix.
340
 
 
322
  | Precision | 0.91 | 0.93 | 0.28 |
323
  | Recall | 0.94 | 0.16 | 0.49 |
324
  | F1 Score | 0.93 | 0.28 | 0.36 |
325
+
326
  <small><i>Table 1: Summary of Model Performance Metrics</i></small>
327
 
328
  ### Strengths
 
330
 
331
  - [The Stanford De-Identifier Base Model](https://huggingface.co/StanfordAIMI/stanford-deidentifier-base)[1] is 99% accurate on our test set of radiology reports and achieves an F1 score of 93% on our challenging open-source benchmark. The others models are really to demonstrate the potential of Pteredactyl to deploy any transfomer model.
332
 
333
+ - We have submitted the code to [OHDSI](https://www.ohdsi.org/) as an abstract and aim strongly to incorporate this into a wider open-source effort to solve intractable clinical informatics problems.
334
 
335
  ### Limitations
336
  - The tool was not designed initially to redact clinic letters as it was developed primarily on radiology reports in the US. We have made some augmentations to cover elements like postcodes using checksums but these might not always work. The same is true of NHS numbers as illustrated above.
337
 
338
+ - It may overly aggressively redact text because it was built as a research tool where precision is prized > recall. However, in our experience this is uncommon enough that it is still very useful.
339
 
340
  - This is very much a research tool and should not be relied upon as a catch-all in any production-type capacity. The app makes the limitations very transparently obvious via the attached confusion matrix.
341