File size: 2,574 Bytes
8df825c
 
 
 
12e3948
8df825c
12e3948
 
8df825c
 
12e3948
 
 
 
 
 
 
 
 
 
8df825c
 
12e3948
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
8df825c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
---
library_name: transformers
tags: []
---
# BiGenderDetection Model Card

## Model Summary
This is a fine-tuned version of the `dccuchile/bert-base-spanish-wwm-cased` model for binary gender classification. The model was trained on a Spanish biomedical dataset to classify text into two categories: female and male.

## Model Details
- **Base Model:** `dccuchile/bert-base-spanish-wwm-cased`
- **Architecture:** FineBERT (custom classifier layers)
- **Number of Labels:** 2 (female, male)
- **Language:** Spanish
- **Problem Type:** Single-label classification
- **Maximum Sequence Length:** 512
- **Dropout:** 0.4
- **Activation Function:** ReLU
- **Output Dimension:** 1
- **BERT Frozen:** No

## Training Details
- **Dataset:** Custom dataset derived from the SPACCC corpus, preprocessed to exclude undetermined labels.
- **Training Epochs:** 25
- **Batch Size:** 8
- **Learning Rate:** 2e-5
- **Optimizer:** AdamW
- **Loss Function:** Binary Cross Entropy Loss (BCELoss)
- **Weight Decay:** 0.01
- **Warmup Steps:** 0
- **Scheduler Factor:** 0.5
- **Scheduler Patience:** 2
- **Early Stopping Patience:** 8
- **Evaluation Strategy:** Per epoch
- **Device:** CUDA
- **Framework:** 🤗 Transformers

## Model Usage
The model is designed for gender classification in Spanish biomedical texts. Given an input text, it predicts one of two classes: female or male.

## How to Use

```python
from transformers import AutoTokenizer
import torch
from model import FineBERTModel  # Import your custom model class
from utils.import_config import FineBERTConfig

# Load configuration
config = FineBERTConfig.from_pretrained("path/to/saved_models/BiGenderDetection")

# Load tokenizer and model
tokenizer = AutoTokenizer.from_pretrained("dccuchile/bert-base-spanish-wwm-cased")
model = FineBERTModel.from_pretrained("path/to/saved_models/BiGenderDetection", config=config)

text = "Paciente femenina de 45 años con antecedentes de hipertensión."
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=512)

# Get predictions
with torch.no_grad():
    logits = model.get_logits(**inputs)
    prediction = torch.round(torch.sigmoid(logits)).detach().numpy()

print(prediction)
```

## Limitations
- The model is trained on Spanish biomedical text and may not generalize well to other domains.
- Gender classification based on text is inherently challenging and may be influenced by biases in the training data.

## Acknowledgments
This model is based on `dccuchile/bert-base-spanish-wwm-cased` and fine-tuned on biomedical data derived from the SPACCC corpus.