--- license: cc-by-nc-4.0 language: - es tags: - simplification - NER --- This is a model for **complex word identification (CWI)** of Spanish medical texts, based on the [multilingual DeBERTa vs 3 (mDeBERTa)](https://huggingface.co/microsoft/mdeberta-v3-base). The model was fine-tuned on a corpus of 225 texts for patients (162575 tokens) to identify **complex words** (**CW**). **Results (test set)** | Class | Precision | Recall | F1 | Accuracy | |:-----:|:-------------:|:-------------:|:-------------:|:-------------:| | CW | 79.05 (±1.39) | 79.01 (±0.70) | 79.02 (±0.65) | 94.86 (±0.22) | *Results are the average of 3 experimental rounds. If you use this model or want to have more details about the experiments and the training details, take a look at our article: ``` @article{2025CWI, title={Complex Word Identification for Lexical Simplification in Spanish Texts for Patients}, author={Ortega-Riba, Federico and Campillos-Llanos, Leonardo and Samy, Doaa}, journal={Procesamiento del lenguaje natural}, volume={74}, year={2025} } ```