|
--- |
|
library_name: transformers |
|
tags: |
|
- jailbreak-detection |
|
- safety |
|
- security |
|
language: |
|
- en |
|
metrics: |
|
- accuracy |
|
- roc_auc |
|
base_model: |
|
- prajjwal1/bert-tiny |
|
- google-bert/bert-base-uncased |
|
pipeline_tag: text-classification |
|
--- |
|
|
|
# Model Card for Model ID |
|
|
|
A small model to detect saturation jailbreak attacks. Not intended for standalone use against other kinds of jailbreaks. |
|
|
|
## Model Details |
|
|
|
### Model Description |
|
|
|
- **Developed by:** Guardrails AI, Joseph Catrambone |
|
- **Funded by [optional]:** Guardrails AI |
|
- **Model type:** Transformer, BERT |
|
- **Language(s) (NLP):** English |
|
- **License:** Restrictive |
|
- **Finetuned from model [optional]:** bert-tiny |
|
|
|
### Model Sources [optional] |
|
|
|
- **Repository:** https://www.github.com/guardrails-ai/detect-jailbreak |
|
|
|
## Uses |
|
|
|
Designed as a small prefilter for a subset of saturation attacks. |
|
|
|
### Out-of-Scope Use |
|
|
|
Not designed to catch other types of jailbreaks. Saturation protection is one part of a more complite suite of defenses against improper use of ML systems. |
|
|