File size: 1,100 Bytes
2b04062
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
---
license: cc-by-nc-4.0
language:
- es
tags:
- simplification
- NER
---

This is a model for **complex word identification (CWI)** of Spanish medical texts, based on the 
[multilingual DeBERTa vs 3 (mDeBERTa)](https://huggingface.co/microsoft/mdeberta-v3-base). 

The model was fine-tuned on a corpus of 225 texts for patients (162575 tokens) to identify **complex words** (**CW**).

**Results (test set)**

| Class |   Precision   |     Recall    |       F1      |    Accuracy   |
|:-----:|:-------------:|:-------------:|:-------------:|:-------------:|
|  CW   | 79.05 (±1.39) | 79.01 (±0.70) | 79.02 (±0.65) | 94.86 (±0.22) |

*Results are the average of 3 experimental rounds.

If you use this model or want to have more details about the experiments and the training details, take a look at our article:

```
@article{2025CWI,
  title={Complex Word Identification for Lexical Simplification in Spanish Texts for Patients},
  author={Ortega-Riba, Federico and Campillos-Llanos, Leonardo and Samy, Doaa},
  journal={Procesamiento del lenguaje natural},
  volume={74},
  year={2025}
}
```