UGARIT
/

grc-alignment

Model card Files Files and versions Community

grc-alignment / README.md

UGARIT's picture

Update README.md

d2428fa almost 3 years ago

|

1.12 kB

	---
	license: cc-by-4.0
	---
	# Automatic Translation Alignment of Ancient Greek Texts
	GRC-ALIGNMENT model is an XLM-RoBERTa-based model, trained on 12 million monolingual ancient Greek tokens with Masked Language Model (MLM) training objective. Further, the model is fine-tuned on 45k parallel sentences mainly in ancient Greek-English, ancient Greek-Latin, and ancient Greek-Georgian.

	### Multilingual Training Dataset
	\| Languages \| # Sentences \| Source \|
	\|:---------------------------------------:\|:-----------:\|:--------------------------------------------------------------------------------:\|
	\| GRC-ENG \| 32.500 \| Perseus Digital Library (Iliad, Odyssey, Xenophon, New Testament) \|
	\| GRC-LAT \| 8.200 \| [Digital Fragmenta Historicorum Graecorum project](https://www.dfhg-project.org/) \|
	\| GRC-KAT GRC-ENG GRC-LAT GRC-ITA GRC-POR \| 4.000 \| [UGARIT Translation Alignment Editor](https://ugarit.ialigner.com/ ) \|