metadata
tags:
- pytorch
- token-classification
- sequence-tagger-model
language: de
datasets:
- conll2003
- germeval_14
license: apache-2.0
About sbb_ner
This is a BERT model for named entity recognition (NER) in historical German.
It predicts the classes PER
, LOC
and ORG
. The model is based on the 🤗
BERT base multilingual cased
model.
We applied unsupervised pre-training on 2,333,647 pages of unlabeled historical German text from the Berlin State Library digital collections, and supervised pre-training on two datasets with contemporary German text, conll2003 and germeval_14.
Results
In a 5-fold cross validation with different historical German NER corpora, the model obtained an F1-Score of 84.3±1.1%.
For details, see our KONVENS2019 paper or have a look at sbb_ner on GitHub.
Weights
We provide model weights for PyTorch.
Model | Downloads |
---|---|
bert-sbb-de-finetuned |
config.json • pytorch_model_ep7.bin • vocab.txt |