File size: 1,100 Bytes
2b04062 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 |
---
license: cc-by-nc-4.0
language:
- es
tags:
- simplification
- NER
---
This is a model for **complex word identification (CWI)** of Spanish medical texts, based on the
[multilingual DeBERTa vs 3 (mDeBERTa)](https://huggingface.co/microsoft/mdeberta-v3-base).
The model was fine-tuned on a corpus of 225 texts for patients (162575 tokens) to identify **complex words** (**CW**).
**Results (test set)**
| Class | Precision | Recall | F1 | Accuracy |
|:-----:|:-------------:|:-------------:|:-------------:|:-------------:|
| CW | 79.05 (±1.39) | 79.01 (±0.70) | 79.02 (±0.65) | 94.86 (±0.22) |
*Results are the average of 3 experimental rounds.
If you use this model or want to have more details about the experiments and the training details, take a look at our article:
```
@article{2025CWI,
title={Complex Word Identification for Lexical Simplification in Spanish Texts for Patients},
author={Ortega-Riba, Federico and Campillos-Llanos, Leonardo and Samy, Doaa},
journal={Procesamiento del lenguaje natural},
volume={74},
year={2025}
}
```
|