BSC-NLP4BIA
/

binary-gender-classifier

Model card Files Files and versions Community

binary-gender-classifier / README.md

edur0's picture

Update README.md

12e3948 verified 3 months ago

|

history blame contribute delete

2.57 kB

	---
	library_name: transformers
	tags: []
	---
	# BiGenderDetection Model Card

	## Model Summary
	This is a fine-tuned version of the `dccuchile/bert-base-spanish-wwm-cased` model for binary gender classification. The model was trained on a Spanish biomedical dataset to classify text into two categories: female and male.

	## Model Details
	- Base Model: `dccuchile/bert-base-spanish-wwm-cased`
	- Architecture: FineBERT (custom classifier layers)
	- Number of Labels: 2 (female, male)
	- Language: Spanish
	- Problem Type: Single-label classification
	- Maximum Sequence Length: 512
	- Dropout: 0.4
	- Activation Function: ReLU
	- Output Dimension: 1
	- BERT Frozen: No

	## Training Details
	- Dataset: Custom dataset derived from the SPACCC corpus, preprocessed to exclude undetermined labels.
	- Training Epochs: 25
	- Batch Size: 8
	- Learning Rate: 2e-5
	- Optimizer: AdamW
	- Loss Function: Binary Cross Entropy Loss (BCELoss)
	- Weight Decay: 0.01
	- Warmup Steps: 0
	- Scheduler Factor: 0.5
	- Scheduler Patience: 2
	- Early Stopping Patience: 8
	- Evaluation Strategy: Per epoch
	- Device: CUDA
	- Framework: 🤗 Transformers

	## Model Usage
	The model is designed for gender classification in Spanish biomedical texts. Given an input text, it predicts one of two classes: female or male.

	## How to Use

	```python
	from transformers import AutoTokenizer
	import torch
	from model import FineBERTModel # Import your custom model class
	from utils.import_config import FineBERTConfig

	# Load configuration
	config = FineBERTConfig.from_pretrained("path/to/saved_models/BiGenderDetection")

	# Load tokenizer and model
	tokenizer = AutoTokenizer.from_pretrained("dccuchile/bert-base-spanish-wwm-cased")
	model = FineBERTModel.from_pretrained("path/to/saved_models/BiGenderDetection", config=config)

	text = "Paciente femenina de 45 años con antecedentes de hipertensión."
	inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=512)

	# Get predictions
	with torch.no_grad():
	logits = model.get_logits(**inputs)
	prediction = torch.round(torch.sigmoid(logits)).detach().numpy()

	print(prediction)
	```

	## Limitations
	- The model is trained on Spanish biomedical text and may not generalize well to other domains.
	- Gender classification based on text is inherently challenging and may be influenced by biases in the training data.

	## Acknowledgments
	This model is based on `dccuchile/bert-base-spanish-wwm-cased` and fine-tuned on biomedical data derived from the SPACCC corpus.