metadata
license: cc-by-4.0
Automatic Translation Alignment of Ancient Greek Texts
GRC-ALIGNMENT model is an XLM-RoBERTa-based model, trained on 12 million monolingual ancient Greek tokens with Masked Language Model (MLM) training objective. Further, the model is fine-tuned on 45k parallel sentences mainly in ancient Greek-English, ancient Greek-Latin, and ancient Greek-Georgian.
Multilingual Training Dataset
Languages | # Sentences | Source |
---|---|---|
GRC-ENG | 32.500 | Perseus Digital Library (Iliad, Odyssey, Xenophon, New Testament) |
GRC-LAT | 8.200 | Digital Fragmenta Historicorum Graecorum project (https://www.dfhg-project.org/) |