|
--- |
|
library_name: transformers |
|
tags: [] |
|
--- |
|
# BiGenderDetection Model Card |
|
|
|
## Model Summary |
|
This is a fine-tuned version of the `dccuchile/bert-base-spanish-wwm-cased` model for binary gender classification. The model was trained on a Spanish biomedical dataset to classify text into two categories: female and male. |
|
|
|
## Model Details |
|
- **Base Model:** `dccuchile/bert-base-spanish-wwm-cased` |
|
- **Architecture:** FineBERT (custom classifier layers) |
|
- **Number of Labels:** 2 (female, male) |
|
- **Language:** Spanish |
|
- **Problem Type:** Single-label classification |
|
- **Maximum Sequence Length:** 512 |
|
- **Dropout:** 0.4 |
|
- **Activation Function:** ReLU |
|
- **Output Dimension:** 1 |
|
- **BERT Frozen:** No |
|
|
|
## Training Details |
|
- **Dataset:** Custom dataset derived from the SPACCC corpus, preprocessed to exclude undetermined labels. |
|
- **Training Epochs:** 25 |
|
- **Batch Size:** 8 |
|
- **Learning Rate:** 2e-5 |
|
- **Optimizer:** AdamW |
|
- **Loss Function:** Binary Cross Entropy Loss (BCELoss) |
|
- **Weight Decay:** 0.01 |
|
- **Warmup Steps:** 0 |
|
- **Scheduler Factor:** 0.5 |
|
- **Scheduler Patience:** 2 |
|
- **Early Stopping Patience:** 8 |
|
- **Evaluation Strategy:** Per epoch |
|
- **Device:** CUDA |
|
- **Framework:** 🤗 Transformers |
|
|
|
## Model Usage |
|
The model is designed for gender classification in Spanish biomedical texts. Given an input text, it predicts one of two classes: female or male. |
|
|
|
## How to Use |
|
|
|
```python |
|
from transformers import AutoTokenizer |
|
import torch |
|
from model import FineBERTModel # Import your custom model class |
|
from utils.import_config import FineBERTConfig |
|
|
|
# Load configuration |
|
config = FineBERTConfig.from_pretrained("path/to/saved_models/BiGenderDetection") |
|
|
|
# Load tokenizer and model |
|
tokenizer = AutoTokenizer.from_pretrained("dccuchile/bert-base-spanish-wwm-cased") |
|
model = FineBERTModel.from_pretrained("path/to/saved_models/BiGenderDetection", config=config) |
|
|
|
text = "Paciente femenina de 45 años con antecedentes de hipertensión." |
|
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=512) |
|
|
|
# Get predictions |
|
with torch.no_grad(): |
|
logits = model.get_logits(**inputs) |
|
prediction = torch.round(torch.sigmoid(logits)).detach().numpy() |
|
|
|
print(prediction) |
|
``` |
|
|
|
## Limitations |
|
- The model is trained on Spanish biomedical text and may not generalize well to other domains. |
|
- Gender classification based on text is inherently challenging and may be influenced by biases in the training data. |
|
|
|
## Acknowledgments |
|
This model is based on `dccuchile/bert-base-spanish-wwm-cased` and fine-tuned on biomedical data derived from the SPACCC corpus. |
|
|
|
|