UGARIT
/

grc-alignment

Model card Files Files and versions Community

grc-alignment / README.md

UGARIT's picture

Update README.md

e3d87b9 almost 3 years ago

|

1.12 kB

	---
	license: cc-by-4.0
	---
	# Automatic Translation Alignment of Ancient Greek Texts
	GRC-ALIGNMENT model is an XLM-RoBERTa-based model, trained on 12 million monolingual ancient Greek tokens with Masked Language Model (MLM) training objective. Further, the model is fine-tuned on 45k parallel sentences, mainly in ancient Greek-English, Greek-Latin, and Greek-Georgian.

	### Multilingual Training Dataset
	\| Languages \|Sentences \| Source \|
	\|:---------------------------------------\|:-----------:\|:--------------------------------------------------------------------------------\|
	\| GRC-ENG \| 32.500 \| Perseus Digital Library (Iliad, Odyssey, Xenophon, New Testament) \|
	\| GRC-LAT \| 8.200 \| [Digital Fragmenta Historicorum Graecorum project](https://www.dfhg-project.org/) \|
	\| GRC-KAT <br>GRC-ENG <br>GRC-LAT<br>GRC-ITA<br>GRC-POR \| 4.000 \| [UGARIT Translation Alignment Editor](https://ugarit.ialigner.com/ ) \|