Added information on the cross-lingual capabilities of the model
Browse files
README.md
CHANGED
@@ -275,6 +275,30 @@ The classifier was compared with other classifiers on 2 additional genre dataset
|
|
275 |
At cross-dataset and cross-lingual experiments, it was shown that the X-GENRE classifier,
|
276 |
trained on all three datasets, outperforms classifiers that were trained on just one of the datasets.
|
277 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
278 |
### Fine-tuning hyperparameters
|
279 |
|
280 |
Fine-tuning was performed with `simpletransformers`.
|
|
|
275 |
At cross-dataset and cross-lingual experiments, it was shown that the X-GENRE classifier,
|
276 |
trained on all three datasets, outperforms classifiers that were trained on just one of the datasets.
|
277 |
|
278 |
+
Additionally, we evaluated the X-GENRE classifier on a multilingual X-GINCO dataset that comprises samples
|
279 |
+
of texts from the MaCoCu web corpora (http://hdl.handle.net/11356/1969).
|
280 |
+
The X-GINCO dataset comprises 790 instances in 10 languages -
|
281 |
+
Albanian, Croatian, Catalan, Greek, Icelandic, Macedonian, Maltese, Slovenian, Turkish, and Ukrainian.
|
282 |
+
To evaluate the performance on genre labels, the dataset is balanced by labels,
|
283 |
+
and the vague label "Other" is not included.
|
284 |
+
Additionally, instances that were predicted with a confidence score below 0.80 were not included in the test dataset.
|
285 |
+
The evaluation shows high cross-lingual performance of the model,
|
286 |
+
even when applied to languages that are not related to the training languages (English and Slovenian) and when applied on non-Latin scripts.
|
287 |
+
The outlier is Maltese, on which classifier does not perform well -
|
288 |
+
we presume that this is due to the fact that Maltese is not included in the pretraining data of the XLM-RoBERTa model.
|
289 |
+
|
290 |
+
| Genre label | ca | el | hr | is | mk | sl | sq | tr | uk | Avg | mt |
|
291 |
+
|---------------|------|------|------|------|------|------|------|------|------|------|------|
|
292 |
+
| News | 0.82 | 0.90 | 0.95 | 0.73 | 0.91 | 0.90 | 0.89 | 0.95 | 1.00 | 0.89 | 0.69 |
|
293 |
+
| Opinion/Argumentation | 0.84 | 0.87 | 0.78 | 0.82 | 0.78 | 0.82 | 0.67 | 0.82 | 0.91 | 0.81 | 0.33 |
|
294 |
+
| Instruction | 0.75 | 0.71 | 0.75 | 0.78 | 1.00 | 1.00 | 0.95 | 0.90 | 0.95 | 0.86 | 0.69 |
|
295 |
+
| Information/Explanation | 0.72 | 0.70 | 0.95 | 0.50 | 0.84 | 0.90 | 0.80 | 0.82 | 1.00 | 0.80 | 0.52 |
|
296 |
+
| Promotion | 0.78 | 0.62 | 0.87 | 0.75 | 0.95 | 1.00 | 0.95 | 0.86 | 0.78 | 0.84 | 0.82 |
|
297 |
+
| Forum | 0.84 | 0.95 | 0.91 | 0.95 | 1.00 | 1.00 | 0.78 | 0.89 | 0.95 | 0.91 | 0.18 |
|
298 |
+
| Prose/Lyrical | 0.91 | 1.00 | 0.86 | 1.00 | 0.95 | 0.91 | 0.86 | 0.95 | 1.00 | 0.93 | 0.18 |
|
299 |
+
| Legal | 0.95 | 1.00 | 1.00 | 0.84 | 0.95 | 0.95 | 0.95 | 1.00 | 1.00 | 0.96 | / |
|
300 |
+
| Macro-F1 | 0.83 | 0.84 | 0.88 | 0.80 | 0.92 | 0.94 | 0.85 | 0.90 | 0.95 | 0.87 | 0.49 |
|
301 |
+
|
302 |
### Fine-tuning hyperparameters
|
303 |
|
304 |
Fine-tuning was performed with `simpletransformers`.
|