BiGenderDetection Model Card

Model Summary

This is a fine-tuned version of the dccuchile/bert-base-spanish-wwm-cased model for binary gender classification. The model was trained on a Spanish biomedical dataset to classify text into two categories: female and male.

Model Details

  • Base Model: dccuchile/bert-base-spanish-wwm-cased
  • Architecture: FineBERT (custom classifier layers)
  • Number of Labels: 2 (female, male)
  • Language: Spanish
  • Problem Type: Single-label classification
  • Maximum Sequence Length: 512
  • Dropout: 0.4
  • Activation Function: ReLU
  • Output Dimension: 1
  • BERT Frozen: No

Training Details

  • Dataset: Custom dataset derived from the SPACCC corpus, preprocessed to exclude undetermined labels.
  • Training Epochs: 25
  • Batch Size: 8
  • Learning Rate: 2e-5
  • Optimizer: AdamW
  • Loss Function: Binary Cross Entropy Loss (BCELoss)
  • Weight Decay: 0.01
  • Warmup Steps: 0
  • Scheduler Factor: 0.5
  • Scheduler Patience: 2
  • Early Stopping Patience: 8
  • Evaluation Strategy: Per epoch
  • Device: CUDA
  • Framework: 🤗 Transformers

Model Usage

The model is designed for gender classification in Spanish biomedical texts. Given an input text, it predicts one of two classes: female or male.

How to Use

from transformers import AutoTokenizer
import torch
from model import FineBERTModel  # Import your custom model class
from utils.import_config import FineBERTConfig

# Load configuration
config = FineBERTConfig.from_pretrained("path/to/saved_models/BiGenderDetection")

# Load tokenizer and model
tokenizer = AutoTokenizer.from_pretrained("dccuchile/bert-base-spanish-wwm-cased")
model = FineBERTModel.from_pretrained("path/to/saved_models/BiGenderDetection", config=config)

text = "Paciente femenina de 45 años con antecedentes de hipertensión."
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=512)

# Get predictions
with torch.no_grad():
    logits = model.get_logits(**inputs)
    prediction = torch.round(torch.sigmoid(logits)).detach().numpy()

print(prediction)

Limitations

  • The model is trained on Spanish biomedical text and may not generalize well to other domains.
  • Gender classification based on text is inherently challenging and may be influenced by biases in the training data.

Acknowledgments

This model is based on dccuchile/bert-base-spanish-wwm-cased and fine-tuned on biomedical data derived from the SPACCC corpus.

Downloads last month
5
Safetensors
Model size
110M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support