edur0's picture
Update README.md
12e3948 verified
---
library_name: transformers
tags: []
---
# BiGenderDetection Model Card
## Model Summary
This is a fine-tuned version of the `dccuchile/bert-base-spanish-wwm-cased` model for binary gender classification. The model was trained on a Spanish biomedical dataset to classify text into two categories: female and male.
## Model Details
- **Base Model:** `dccuchile/bert-base-spanish-wwm-cased`
- **Architecture:** FineBERT (custom classifier layers)
- **Number of Labels:** 2 (female, male)
- **Language:** Spanish
- **Problem Type:** Single-label classification
- **Maximum Sequence Length:** 512
- **Dropout:** 0.4
- **Activation Function:** ReLU
- **Output Dimension:** 1
- **BERT Frozen:** No
## Training Details
- **Dataset:** Custom dataset derived from the SPACCC corpus, preprocessed to exclude undetermined labels.
- **Training Epochs:** 25
- **Batch Size:** 8
- **Learning Rate:** 2e-5
- **Optimizer:** AdamW
- **Loss Function:** Binary Cross Entropy Loss (BCELoss)
- **Weight Decay:** 0.01
- **Warmup Steps:** 0
- **Scheduler Factor:** 0.5
- **Scheduler Patience:** 2
- **Early Stopping Patience:** 8
- **Evaluation Strategy:** Per epoch
- **Device:** CUDA
- **Framework:** 🤗 Transformers
## Model Usage
The model is designed for gender classification in Spanish biomedical texts. Given an input text, it predicts one of two classes: female or male.
## How to Use
```python
from transformers import AutoTokenizer
import torch
from model import FineBERTModel # Import your custom model class
from utils.import_config import FineBERTConfig
# Load configuration
config = FineBERTConfig.from_pretrained("path/to/saved_models/BiGenderDetection")
# Load tokenizer and model
tokenizer = AutoTokenizer.from_pretrained("dccuchile/bert-base-spanish-wwm-cased")
model = FineBERTModel.from_pretrained("path/to/saved_models/BiGenderDetection", config=config)
text = "Paciente femenina de 45 años con antecedentes de hipertensión."
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=512)
# Get predictions
with torch.no_grad():
logits = model.get_logits(**inputs)
prediction = torch.round(torch.sigmoid(logits)).detach().numpy()
print(prediction)
```
## Limitations
- The model is trained on Spanish biomedical text and may not generalize well to other domains.
- Gender classification based on text is inherently challenging and may be influenced by biases in the training data.
## Acknowledgments
This model is based on `dccuchile/bert-base-spanish-wwm-cased` and fine-tuned on biomedical data derived from the SPACCC corpus.