|
--- |
|
license: mit |
|
pipeline_tag: token-classification |
|
tags: |
|
- BERT |
|
- bioBERT |
|
- NER |
|
- medical |
|
metrics: |
|
- f1 |
|
language: |
|
- en |
|
--- |
|
|
|
# Model |
|
|
|
NER-Model for disease/treatment/technology entity recognition. The purpose of the model/data use is educational. |
|
|
|
The original dataset tags have been augmented with "inside"-Tags in order to handle sub-tokens produced by the WordPiece tokenizer. Following NER-tags are used: |
|
* `B-DISEASE`, `I-DISEASE`: begin and inside tags for disease |
|
* `B-TREATMENT`, `I-TREATMENT`: begin and inside tags for treatment |
|
* `B-TECHNOLOGY`, `I-TECHNOLOGY`: begin and inside tags for technology |
|
* `O` - outside entities (irrelevant) |
|
|
|
``` |
|
# Text: |
|
Acute obstructive hydrocephalus complicating bacterial meningitis in childhood |
|
|
|
# Real: |
|
Acute -> DISEASE |
|
obstructive -> DISEASE |
|
hydrocephalus -> DISEASE |
|
bacterial -> DISEASE |
|
meningitis -> DISEASE |
|
|
|
# Predictions: |
|
o##bs##truct##ive -> B-DISEASE + I-DISEASE + I-DISEASE + I-DISEASE |
|
h##ydro##ce##pha##lus -> B-DISEASE + I-DISEASE + I-DISEASE + I-DISEASE + I-DISEASE |
|
bacterial -> B-DISEASE |
|
men##ing##itis -> B-DISEASE + I-DISEASE + I-DISEASE |
|
``` |
|
|
|
# Sources |
|
|
|
This pipeline is based on the [dmis-lab/biobert-base-cased-v1.2](https://huggingface.co/dmis-lab/biobert-base-cased-v1.2) pretrained model, |
|
fine-tuned using the relatively small [BeHealthy Medical Entity](https://www.kaggle.com/datasets/arunagirirajan/medical-entity-recognition-ner) |
|
dataset (1.550 training samples). The initial version of this model was then used |
|
to augment the medical technology [dataset](https://github.com/VictoriaDimanova/Robust-medical-NER/tree/main/Textcorpus). Both datasets were then used to train |
|
this model. |
|
|
|
# Performance |
|
|
|
The model has not been extensively tuned. The quality of the dataset is not clear, due to unknown origin of the data / annotation process. |
|
|
|
| Metric | Score | |
|
|-----------|----------| |
|
| Precision | 0.836892 | |
|
| Recall | 0.766610 | |
|
| F1 | 0.800211 | |
|
| Accuracy | 0.935253 | |
|
|