Text Classification
Transformers
PyTorch
Safetensors
xlm-roberta
genre
text-genre
TajaKuzman commited on
Commit
68c4a0f
·
verified ·
1 Parent(s): 8784f89

Added information on the cross-lingual capabilities of the model

Browse files
Files changed (1) hide show
  1. README.md +24 -0
README.md CHANGED
@@ -275,6 +275,30 @@ The classifier was compared with other classifiers on 2 additional genre dataset
275
  At cross-dataset and cross-lingual experiments, it was shown that the X-GENRE classifier,
276
  trained on all three datasets, outperforms classifiers that were trained on just one of the datasets.
277
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
278
  ### Fine-tuning hyperparameters
279
 
280
  Fine-tuning was performed with `simpletransformers`.
 
275
  At cross-dataset and cross-lingual experiments, it was shown that the X-GENRE classifier,
276
  trained on all three datasets, outperforms classifiers that were trained on just one of the datasets.
277
 
278
+ Additionally, we evaluated the X-GENRE classifier on a multilingual X-GINCO dataset that comprises samples
279
+ of texts from the MaCoCu web corpora (http://hdl.handle.net/11356/1969).
280
+ The X-GINCO dataset comprises 790 instances in 10 languages -
281
+ Albanian, Croatian, Catalan, Greek, Icelandic, Macedonian, Maltese, Slovenian, Turkish, and Ukrainian.
282
+ To evaluate the performance on genre labels, the dataset is balanced by labels,
283
+ and the vague label "Other" is not included.
284
+ Additionally, instances that were predicted with a confidence score below 0.80 were not included in the test dataset.
285
+ The evaluation shows high cross-lingual performance of the model,
286
+ even when applied to languages that are not related to the training languages (English and Slovenian) and when applied on non-Latin scripts.
287
+ The outlier is Maltese, on which classifier does not perform well -
288
+ we presume that this is due to the fact that Maltese is not included in the pretraining data of the XLM-RoBERTa model.
289
+
290
+ | Genre label | ca | el | hr | is | mk | sl | sq | tr | uk | Avg | mt |
291
+ |---------------|------|------|------|------|------|------|------|------|------|------|------|
292
+ | News | 0.82 | 0.90 | 0.95 | 0.73 | 0.91 | 0.90 | 0.89 | 0.95 | 1.00 | 0.89 | 0.69 |
293
+ | Opinion/Argumentation | 0.84 | 0.87 | 0.78 | 0.82 | 0.78 | 0.82 | 0.67 | 0.82 | 0.91 | 0.81 | 0.33 |
294
+ | Instruction | 0.75 | 0.71 | 0.75 | 0.78 | 1.00 | 1.00 | 0.95 | 0.90 | 0.95 | 0.86 | 0.69 |
295
+ | Information/Explanation | 0.72 | 0.70 | 0.95 | 0.50 | 0.84 | 0.90 | 0.80 | 0.82 | 1.00 | 0.80 | 0.52 |
296
+ | Promotion | 0.78 | 0.62 | 0.87 | 0.75 | 0.95 | 1.00 | 0.95 | 0.86 | 0.78 | 0.84 | 0.82 |
297
+ | Forum | 0.84 | 0.95 | 0.91 | 0.95 | 1.00 | 1.00 | 0.78 | 0.89 | 0.95 | 0.91 | 0.18 |
298
+ | Prose/Lyrical | 0.91 | 1.00 | 0.86 | 1.00 | 0.95 | 0.91 | 0.86 | 0.95 | 1.00 | 0.93 | 0.18 |
299
+ | Legal | 0.95 | 1.00 | 1.00 | 0.84 | 0.95 | 0.95 | 0.95 | 1.00 | 1.00 | 0.96 | / |
300
+ | Macro-F1 | 0.83 | 0.84 | 0.88 | 0.80 | 0.92 | 0.94 | 0.85 | 0.90 | 0.95 | 0.87 | 0.49 |
301
+
302
  ### Fine-tuning hyperparameters
303
 
304
  Fine-tuning was performed with `simpletransformers`.