Ancient Greek BERT
The first and only available Ancient Greek sub-word BERT model!
State-of-the-art post fine-tuning on Part-of-Speech Tagging and Morphological Analysis.
Pre-trained weights are made available for a standard 12 layer, 768d BERT-base model.
Further scripts for using the model and fine-tuning it for PoS Tagging are available on our Github repository!
Please refer to our paper titled: "A Pilot Study for BERT Language Modelling and Morphological Analysis for Ancient and Medieval Greek". In Proceedings of The 5th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature (LaTeCH-CLfL 2021)
How to use
Can be directly used from the HuggingFace Model Hub with:
from transformers import AutoTokenizer, AutoModel
tokeniser = AutoTokenizer.from_pretrained("pranaydeeps/Ancient-Greek-BERT")
model = AutoModel.from_pretrained("pranaydeeps/Ancient-Greek-BERT")
Training data
The model was initialised from AUEB NLP Group's Greek BERT and subsequently trained on monolingual data from the First1KGreek Project, Perseus Digital Library, PROIEL Treebank and Gorman's Treebank
Training and Eval details
Standard de-accentuating and lower-casing for Greek as suggested in AUEB NLP Group's Greek BERT
The model was trained on 4 NVIDIA Tesla V100 16GB GPUs for 80 epochs, with a max-seq-len of 512 and results in a perplexity of 4.8 on the held out test set. It also gives state-of-the-art results when fine-tuned for PoS Tagging and Morphological Analysis on all 3 treebanks averaging >90% accuracy. Please consult our paper or contact me for further questions!