|
--- |
|
license: apache-2.0 |
|
language: |
|
- sr |
|
metrics: |
|
- f1 |
|
- accuracy |
|
base_model: |
|
- classla/bcms-bertic |
|
pipeline_tag: token-classification |
|
library_name: transformers |
|
tags: |
|
- legal |
|
--- |
|
|
|
# BERTić-COMtext-SR-legal-NER-ekavica |
|
|
|
**BERTić-COMtext-SR-legal-NER-ekavica** is a variant of the [BERTić](https://huggingface.co/classla/bcms-bertic) model, fine-tuned on the task of named entity recognition in Serbian legal texts written in the Ekavian pronunciation. |
|
The model was fine-tuned for 20 epochs on the Ekavian variant of the [COMtext.SR.legal](https://github.com/ICEF-NLP/COMtext.SR) dataset. |
|
|
|
# Benchmarking |
|
|
|
This model was evaluated on the task of named entity recognition in Serbian legal texts. |
|
The model uses a newly developed named entity schema consisting of 21 entity types, tailored for the domain of Serbian legal texts, and encoded according the the IOB2 standard. |
|
The full entity list is available on the [COMtext.SR GitHub repository](https://github.com/ICEF-NLP/COMtext.SR). |
|
|
|
This model was compared with [SrBERTa](http://huggingface.co/nemanjaPetrovic/SrBERTa), a model specially trained on Serbian legal texts, fine-tuned for 20 epochs for named entity recognition using the Ekavian variant of the [COMtext.SR.legal](https://github.com/ICEF-NLP/COMtext.SR) corpus of legal texts. Token-level accuracy and F1 (macro-averaged and per-class) were used as evaluation metrics and gold tokenized text was taken as input. |
|
|
|
Two evaluation settings for both models were considered: |
|
* Default - only the entity type portion of the NE tag is considered, effectively ignoring the "B-" and "I-" prefixes |
|
* Strict - the entire NE tag is considered |
|
|
|
For the strict setting, per-class results are given separately for each B-CLASS and I-CLASS tag. |
|
In addition, macro-averaged F1 scores are presented in two variants - one where the O (outside) class is ignored, and another where it is treated equally to other named entity classes. |
|
|
|
BERTić-COMtext-SR-legal-NER-ekavica and SrBERTa were fine-tuned and evaluated on the COMtext.SR.legal.ekavica corpus using 10-fold CV. |
|
|
|
The code and data to run these experiments is available on the [COMtext.SR GitHub repository](https://github.com/ICEF-NLP/COMtext.SR). |
|
|
|
## Results |
|
|
|
| Metrics | BERTić-COMtext-SR-legal-NER-ekavica (default) | BERTić-COMtext-SR-legal-NER-ekavica (strict) | SrBERTa (default) | SrBERTa (strict) | |
|
| -------------------- | --------------------------------------------- | -------------------------------------------- | ----------------- | ---------------- | |
|
| Accuracy | **0.9849** | 0.9837 | 0.9685 | 0.9670 | |
|
| Macro F1 (with O) | **0.8522** | 0.8418 | 0.7270 | 0.7152 | |
|
| Macro F1 (without O) | **0.8355** | 0.8335 | 0.7033 | 0.7028 | |
|
| *Per-class F1* | | | | | |
|
| PER | 0.9811 | 0.9734 / 0.9713 | 0.8695 | 0.8216 / 0.8901 | |
|
| LOC | 0.9027 | 0.9016 / 0.8520 | 0.6858 | 0.6770 / 0.6557 | |
|
| ADR | 0.9252 | 0.8803 / 0.9168 | 0.8448 | 0.7841 / 0.8297 | |
|
| COURT | 0.9450 | 0.9424 / 0.9408 | 0.7809 | 0.7440 / 0.7867 | |
|
| INST | 0.7848 | 0.7912 / 0.8087 | 0.6346 | 0.6487 / 0.6376 | |
|
| COM | 0.7577 | 0.6932 / 0.7435 | 0.4719 | 0.3685 / 0.4461 | |
|
| OTHORG | 0.4458 | 0.3223 / 0.5464 | 0.3054 | 0.2471 / 0.3597 | |
|
| LAW | 0.9583 | 0.9565 / 0.9572 | 0.9133 | 0.8793 / 0.9130 | |
|
| REF | 0.8315 | 0.7611 / 0.8200 | 0.7706 | 0.6386 / 0.7609 | |
|
| IDPER | 0.9630 | 0.9630 / N/A | 1.0000 | 1.0000 / N/A | |
|
| IDCOM | 0.9779 | 0.9779 / N/A | 0.9018 | 0.9018 / N/A | |
|
| IDTAX | 1.0000 | 1.0000 / N/A | 0.9667 | 0.9667 / N/A | |
|
| NUMACC | 1.0000 | 1.0000 / N/A | 0.6667 | 0.6667 / N/A | |
|
| NUMDOC | 0.5333 | 0.5333 / N/A | 0.3333 | 0.3333 / N/A | |
|
| NUMCAR | 0.6111 | 0.5079 / 0.4286 | 0.3879 | 0.4333 / 0.0 | |
|
| NUMPLOT | 0.7143 | 0.7143 / N/A | 0.4928 | 0.4928 / N/A | |
|
| IDOTH | 0.6161 | 0.6161 / N/A | 0.3967 | 0.3967 / N/A | |
|
| CONTACT | 0.8000 | 0.8000 / N/A | 0.1333 | 0.1333 / N/A | |
|
| DATE | 0.9602 | 0.9383 / 0.9544 | 0.9491 | 0.9079 / 0.9492 | |
|
| MONEY | 0.9703 | 0.9543 / 0.9662 | 0.8885 | 0.8926 / 0.8852 | |
|
| MISC | 0.4445 | 0.4032 / 0.4149 | 0.2113 | 0.2154 / 0.1962 | |
|
| O | 0.9946 | 0.9946 | 0.9870 | 0.9870 | |
|
|