--- library_name: transformers tags: [] --- # BiGenderDetection Model Card ## Model Summary This is a fine-tuned version of the `dccuchile/bert-base-spanish-wwm-cased` model for binary gender classification. The model was trained on a Spanish biomedical dataset to classify text into two categories: female and male. ## Model Details - **Base Model:** `dccuchile/bert-base-spanish-wwm-cased` - **Architecture:** FineBERT (custom classifier layers) - **Number of Labels:** 2 (female, male) - **Language:** Spanish - **Problem Type:** Single-label classification - **Maximum Sequence Length:** 512 - **Dropout:** 0.4 - **Activation Function:** ReLU - **Output Dimension:** 1 - **BERT Frozen:** No ## Training Details - **Dataset:** Custom dataset derived from the SPACCC corpus, preprocessed to exclude undetermined labels. - **Training Epochs:** 25 - **Batch Size:** 8 - **Learning Rate:** 2e-5 - **Optimizer:** AdamW - **Loss Function:** Binary Cross Entropy Loss (BCELoss) - **Weight Decay:** 0.01 - **Warmup Steps:** 0 - **Scheduler Factor:** 0.5 - **Scheduler Patience:** 2 - **Early Stopping Patience:** 8 - **Evaluation Strategy:** Per epoch - **Device:** CUDA - **Framework:** šŸ¤— Transformers ## Model Usage The model is designed for gender classification in Spanish biomedical texts. Given an input text, it predicts one of two classes: female or male. ## How to Use ```python from transformers import AutoTokenizer import torch from model import FineBERTModel # Import your custom model class from utils.import_config import FineBERTConfig # Load configuration config = FineBERTConfig.from_pretrained("path/to/saved_models/BiGenderDetection") # Load tokenizer and model tokenizer = AutoTokenizer.from_pretrained("dccuchile/bert-base-spanish-wwm-cased") model = FineBERTModel.from_pretrained("path/to/saved_models/BiGenderDetection", config=config) text = "Paciente femenina de 45 aƱos con antecedentes de hipertensión." inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=512) # Get predictions with torch.no_grad(): logits = model.get_logits(**inputs) prediction = torch.round(torch.sigmoid(logits)).detach().numpy() print(prediction) ``` ## Limitations - The model is trained on Spanish biomedical text and may not generalize well to other domains. - Gender classification based on text is inherently challenging and may be influenced by biases in the training data. ## Acknowledgments This model is based on `dccuchile/bert-base-spanish-wwm-cased` and fine-tuned on biomedical data derived from the SPACCC corpus.