This is a model for complex word identification (CWI) of Spanish medical texts, based on the BETO Spanish BERT model.

The model was fine-tuned on a corpus of 225 texts for patients (162575 tokens) to identify complex words (CW).

Results (test set)

Class Precision Recall F1 Accuracy
CW 75.01 (±1.11) 82.98 (±0.60) 78.78 (±0.34) 93.49 (±0.13)

*Results are the average of 3 experimental rounds.

If you use this model or want to have more details about the experiments and the training details, take a look at our article:

@article{2025CWI,
  title={Complex Word Identification for Lexical Simplification in Spanish Texts for Patients},
  author={Ortega-Riba, Federico and Campillos-Llanos, Leonardo and Samy, Doaa},
  journal={Procesamiento del lenguaje natural},
  volume={74},
  year={2025}
}
Downloads last month
14
Safetensors
Model size
109M params
Tensor type
F32
·
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no library tag.