SBB
/

Token Classification
Transformers
PyTorch
German
bert
sequence-tagger-model
sbb_ner / README.md
cneud's picture
Update README.md
cbe839d
|
raw
history blame
1.54 kB
metadata
tags:
  - pytorch
  - token-classification
  - sequence-tagger-model
language: de
datasets:
  - conll2003
  - germeval_14
license: apache-2.0

About sbb_ner

This is a BERT model for named entity recognition (NER) in historical German. It predicts the classes PER, LOC and ORG. The model is based on the 🤗 BERT base multilingual cased model.

We applied unsupervised pre-training on 2,333,647 pages of unlabeled historical German text from the Berlin State Library digital collections, and supervised pre-training on two datasets with contemporary German text, conll2003 and germeval_14.

Results

In a 5-fold cross validation with different historical German NER corpora, the model obtained an F1-Score of 84.3±1.1%.

For details, see our KONVENS2019 paper or have a look at sbb_ner on GitHub.

Weights

We provide model weights for PyTorch.

Model Downloads
bert-sbb-de-finetuned config.jsonpytorch_model_ep7.binvocab.txt