Model Card for Medical Safety Classification Model AIShield
Model Details
Model Description
This model is designed for medical safety classification, distinguishing between medical safe and medical unsafe queries. It has been evaluated rigorously on multiple datasets to assess its reliability in safety-critical applications.
- Developed by: AIShield
- Model type: Transformer-based classification model
- Language(s) (NLP): English
- License: Non-permissive, private, not for commercialization
- Finetuned from model: distilbert-base-uncased
Model Sources [optional]
- Repository: [More Information Needed]
- Paper [optional]: [More Information Needed]
- Demo [optional]: [More Information Needed]
Uses
Direct Use
This model is intended for medical content moderation, ensuring that unsafe queries are flagged appropriately while minimizing false positives for safe content.
Downstream Use [optional]
- Can be fine-tuned further for broader safety classification, including generic unsafe content.
- May be integrated into health-related AI assistants to prevent the spread of misinformation.
Out-of-Scope Use
- Not intended for legal or regulatory decision-making.
- Not a substitute for medical expertise.
- Might not generalize well to non-medical domains without further training.
Bias, Risks, and Limitations
Risks and Limitations
- Potential Over-Filtering: Some safe medical queries may be incorrectly flagged as unsafe (~0.059% false positive rate).
- Domain-Specific Performance: While effective on medical safety classification, performance slightly varies on generic unsafe content.
- False Negatives on Generic Unsafe Data: In one test, 5.26% of generic unsafe queries were misclassified as safe.
Recommendations
- Fine-tune with diverse safety datasets to improve generalization.
- Adjust classification thresholds to balance false positives and false negatives based on application needs.
How to Get Started with the Model
Use the code below to get started with the model.
from transformers import pipeline
classifier = pipeline("text-classification", model="parmarm/medical_unsafe_detection_bert_final_v1")
result = classifier("Is it safe to take ibuprofen with aspirin?")
print(result)
Training Details
Training Data
- Safe Questions: FreedomIntelligence/medical-o1-reasoning-SFT (25,371 questions)
- Unsafe Questions: AI4LIFE-GROUP/med-safety-bench (Total: 75,272, used for training: 25,371)
- Balanced dataset for training: 50,742 samples
Left-Out Dataset
- Medical Unsafe Questions: AI4LIFE-GROUP/med-safety-bench (remaining 49,003 questions)
- Medical Safe Questions: medalpaca/medical_meadow_medqa (10,178 questions)
Generic Safety Dataset
- Generic Unsafe #1: AI4LIFE-GROUP/med-safety-bench (456 questions)
- Generic Unsafe #2: AmberYifan/AdvBench_safe (520 questions)
Training Procedure
- Output Directory:
./bert_medical_classifier_train
- Evaluation Strategy: Epoch-based
- Save Strategy: Epoch-based
- Learning Rate:
1e-5
- Batch Size (Train & Eval):
32
- Gradient Accumulation Steps:
4
- Epochs:
2
- Weight Decay:
0.1
- Warmup Ratio:
0.06
- Logging Steps:
100
- Save Total Limit:
2
- Load Best Model at End:
True
- Best Model Metric:
eval_loss
- Dataloader Workers:
16
Optimization Details
- Optimizer: AdamW (
lr=2e-5
,weight_decay=0.1
,fused=True
) - Loss Function: Class-weighted CrossEntropyLoss
- Custom Trainer: Implements weighted loss computation
Post-Training Performance Metrics
Training Metrics
- Global Steps:
296
- Training Loss:
0.0663
- Training Runtime:
141.55s
- Train Samples per Second:
268.75
- Train Steps per Second:
2.091
Evaluation Metrics
- Eval Loss:
0.0120
- Eval Accuracy:
99.68%
- Eval Precision:
99.41%
- Eval Recall:
99.94%
- Eval F1 Score:
99.68%
- Eval ROC-AUC:
99.998%
- Evaluation Runtime:
34.81s
- Eval Samples per Second:
1639.08
- Eval Steps per Second:
51.24
Evaluation
Testing Data, Factors & Metrics
Datasets Used for Evaluation
Dataset | Size | Category | Purpose |
---|---|---|---|
Balanced Medical Dataset | 50,742 | Medical Safe & Unsafe | Primary performance evaluation |
Left-Out Medical Unsafe | 49,003 | Medical Unsafe | Evaluating recall for unsafe cases |
Left-Out Medical Safe | 10,178 | Medical Safe | Evaluating false positives |
Generic Unsafe #1 | 456 | Generic Unsafe | Checking generalization capability |
Generic Unsafe #2 | 520 | Generic Unsafe | Further verification of generalization |
Evaluation Metrics
- Accuracy: Measures overall correctness.
- Precision (for Unsafe Queries): How many predicted unsafe cases were actually unsafe.
- Recall (for Unsafe Queries): How many actual unsafe cases were correctly identified.
- F1 Score: The harmonic mean of precision and recall.
- False Positive Rate (FPR): Percentage of safe queries misclassified as unsafe.
- False Negative Rate (FNR): Percentage of unsafe queries misclassified as safe.
Results Summary
1. Balanced Medical Dataset (50,742 samples)
- Accuracy: 99.74%
- Precision (Unsafe): 99.49%
- Recall (Unsafe): 99.97%
- F1 Score: 99.73%
- False Positive Rate: 0.51%
- False Negative Rate: 0.03%
2. Left-Out Medical Unsafe Dataset (49,003 samples)
- Recall (Unsafe): 99.98%
- False Negative Rate: 0.0163%
3. Left-Out Medical Safe Dataset (10,178 samples)
- Accuracy/Specificity: 99.94%
- False Positive Rate: 0.059%
4. Generic Unsafe Dataset #1 (456 samples)
- Recall: 94.74%
- False Negative Rate: 5.26%
5. Generic Unsafe Dataset #2 (520 samples)
- Recall: 100%
Model Card Contact
For inquiries, contact AIShield.
- Downloads last month
- 21
Inference Providers
NEW
This model is not currently available via any of the supported Inference Providers.