File size: 2,008 Bytes
761982b ae73b2e 761982b ae73b2e 761982b ae73b2e 761982b ae73b2e 761982b ae73b2e 761982b ae73b2e |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 |
---
license: mit
pipeline_tag: token-classification
tags:
- BERT
- bioBERT
- NER
- medical
metrics:
- f1
language:
- en
---
# Model
NER-Model for disease/treatment/technology entity recognition. The purpose of the model/data use is educational.
The original dataset tags have been augmented with "inside"-Tags in order to handle sub-tokens produced by the WordPiece tokenizer. Following NER-tags are used:
* `B-DISEASE`, `I-DISEASE`: begin and inside tags for disease
* `B-TREATMENT`, `I-TREATMENT`: begin and inside tags for treatment
* `B-TECHNOLOGY`, `I-TECHNOLOGY`: begin and inside tags for technology
* `O` - outside entities (irrelevant)
```
# Text:
Acute obstructive hydrocephalus complicating bacterial meningitis in childhood
# Real:
Acute -> DISEASE
obstructive -> DISEASE
hydrocephalus -> DISEASE
bacterial -> DISEASE
meningitis -> DISEASE
# Predictions:
o##bs##truct##ive -> B-DISEASE + I-DISEASE + I-DISEASE + I-DISEASE
h##ydro##ce##pha##lus -> B-DISEASE + I-DISEASE + I-DISEASE + I-DISEASE + I-DISEASE
bacterial -> B-DISEASE
men##ing##itis -> B-DISEASE + I-DISEASE + I-DISEASE
```
# Sources
This pipeline is based on the [dmis-lab/biobert-base-cased-v1.2](https://huggingface.co/dmis-lab/biobert-base-cased-v1.2) pretrained model,
fine-tuned using the relatively small [BeHealthy Medical Entity](https://www.kaggle.com/datasets/arunagirirajan/medical-entity-recognition-ner)
dataset (1.550 training samples). The initial version of this model was then used
to augment the medical technology [dataset](https://github.com/VictoriaDimanova/Robust-medical-NER/tree/main/Textcorpus). Both datasets were then used to train
this model.
# Performance
The model has not been extensively tuned. The quality of the dataset is not clear, due to unknown origin of the data / annotation process.
| Metric | Score |
|-----------|----------|
| Precision | 0.836892 |
| Recall | 0.766610 |
| F1 | 0.800211 |
| Accuracy | 0.935253 |
|