metadata

library_name: transformers
tags: []

BiGenderDetection Model Card

Model Summary

This is a fine-tuned version of the dccuchile/bert-base-spanish-wwm-cased model for binary gender classification. The model was trained on a Spanish biomedical dataset to classify text into two categories: female and male.

Model Details

Base Model: dccuchile/bert-base-spanish-wwm-cased
Architecture: FineBERT (custom classifier layers)
Number of Labels: 2 (female, male)
Language: Spanish
Problem Type: Single-label classification
Maximum Sequence Length: 512
Dropout: 0.4
Activation Function: ReLU
Output Dimension: 1
BERT Frozen: No

Training Details

Dataset: Custom dataset derived from the SPACCC corpus, preprocessed to exclude undetermined labels.
Training Epochs: 25
Batch Size: 8
Learning Rate: 2e-5
Optimizer: AdamW
Loss Function: Binary Cross Entropy Loss (BCELoss)
Weight Decay: 0.01
Warmup Steps: 0
Scheduler Factor: 0.5
Scheduler Patience: 2
Early Stopping Patience: 8
Evaluation Strategy: Per epoch
Device: CUDA
Framework: 🤗 Transformers

Model Usage

The model is designed for gender classification in Spanish biomedical texts. Given an input text, it predicts one of two classes: female or male.

How to Use

from transformers import AutoTokenizer
import torch
from model import FineBERTModel  # Import your custom model class
from utils.import_config import FineBERTConfig

# Load configuration
config = FineBERTConfig.from_pretrained("path/to/saved_models/BiGenderDetection")

# Load tokenizer and model
tokenizer = AutoTokenizer.from_pretrained("dccuchile/bert-base-spanish-wwm-cased")
model = FineBERTModel.from_pretrained("path/to/saved_models/BiGenderDetection", config=config)

text = "Paciente femenina de 45 años con antecedentes de hipertensión."
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=512)

# Get predictions
with torch.no_grad():
    logits = model.get_logits(**inputs)
    prediction = torch.round(torch.sigmoid(logits)).detach().numpy()

print(prediction)

Limitations

The model is trained on Spanish biomedical text and may not generalize well to other domains.
Gender classification based on text is inherently challenging and may be influenced by biases in the training data.

Acknowledgments

This model is based on dccuchile/bert-base-spanish-wwm-cased and fine-tuned on biomedical data derived from the SPACCC corpus.