metadata
library_name: transformers
tags: []
BiGenderDetection Model Card
Model Summary
This is a fine-tuned version of the dccuchile/bert-base-spanish-wwm-cased
model for binary gender classification. The model was trained on a Spanish biomedical dataset to classify text into two categories: female and male.
Model Details
- Base Model:
dccuchile/bert-base-spanish-wwm-cased
- Architecture: FineBERT (custom classifier layers)
- Number of Labels: 2 (female, male)
- Language: Spanish
- Problem Type: Single-label classification
- Maximum Sequence Length: 512
- Dropout: 0.4
- Activation Function: ReLU
- Output Dimension: 1
- BERT Frozen: No
Training Details
- Dataset: Custom dataset derived from the SPACCC corpus, preprocessed to exclude undetermined labels.
- Training Epochs: 25
- Batch Size: 8
- Learning Rate: 2e-5
- Optimizer: AdamW
- Loss Function: Binary Cross Entropy Loss (BCELoss)
- Weight Decay: 0.01
- Warmup Steps: 0
- Scheduler Factor: 0.5
- Scheduler Patience: 2
- Early Stopping Patience: 8
- Evaluation Strategy: Per epoch
- Device: CUDA
- Framework: 🤗 Transformers
Model Usage
The model is designed for gender classification in Spanish biomedical texts. Given an input text, it predicts one of two classes: female or male.
How to Use
from transformers import AutoTokenizer
import torch
from model import FineBERTModel # Import your custom model class
from utils.import_config import FineBERTConfig
# Load configuration
config = FineBERTConfig.from_pretrained("path/to/saved_models/BiGenderDetection")
# Load tokenizer and model
tokenizer = AutoTokenizer.from_pretrained("dccuchile/bert-base-spanish-wwm-cased")
model = FineBERTModel.from_pretrained("path/to/saved_models/BiGenderDetection", config=config)
text = "Paciente femenina de 45 años con antecedentes de hipertensión."
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=512)
# Get predictions
with torch.no_grad():
logits = model.get_logits(**inputs)
prediction = torch.round(torch.sigmoid(logits)).detach().numpy()
print(prediction)
Limitations
- The model is trained on Spanish biomedical text and may not generalize well to other domains.
- Gender classification based on text is inherently challenging and may be influenced by biases in the training data.
Acknowledgments
This model is based on dccuchile/bert-base-spanish-wwm-cased
and fine-tuned on biomedical data derived from the SPACCC corpus.