metadata
license: cc-by-4.0
Automatic Translation Alignment of Ancient Greek Texts
GRC-ALIGNMENT model is an XLM-RoBERTa-based model, trained on 12 million monolingual ancient Greek tokens with Masked Language Model (MLM) training objective. Further, the model is fine-tuned on 45k parallel sentences mainly in ancient Greek-English, ancient Greek-Latin, and ancient Greek-Georgian.
Multilingual Training Dataset
Languages | # Sentences | Source |
---|---|---|
GRC-ENG | 32.500 | Perseus Digital Library (Iliad, Odyssey, Xenophon, New Testament) |
GRC-LAT | 8.200 | Digital Fragmenta Historicorum Graecorum project |
GRC-KAT GRC-ENG GRC-LAT GRC-ITA GRC-POR | 4.000 | UGARIT Translation Alignment Editor |