File size: 3,008 Bytes
505c86b 1d52dd2 505c86b 1d52dd2 d9707d3 1d52dd2 d9707d3 1d52dd2 d9707d3 1d52dd2 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 |
---
license: other
license_name: govtech-singapore
license_link: LICENSE
datasets:
- gabrielchua/off-topic
language:
- en
metrics:
- roc_auc
- f1
- precision
- recall
base_model:
- cross-encoder/stsb-roberta-base
---
# Off-Topic Classification Model
This model leverages a fine-tuned **Cross Encoder STSB Roberta Base** to perform binary classification, determining whether a user prompt is off-topic in relation to the system's intended purpose as defined by the system prompt.
## Model Highlights
- **Base Model**: [`stsb-roberta-base`](https://huggingface.co/cross-encoder/stsb-roberta-base)
- **Maximum Context Length**: 514 tokens
- **Task**: Binary classification (on-topic/off-topic)
## Performance
We evaluated our fine-tuned models on synthetic data modelling system and user prompt pairs reflecting real world enterprise use cases of LLMs. The dataset is available [here](https://huggingface.co/datasets/gabrielchua/off-topic).
| Approach | Model | ROC-AUC | F1 | Precision | Recall |
|---------------------------------------|--------------------------------|---------|------|-----------|--------|
| 👉 [Fine-tuned bi-encoder classifier](https://huggingface.co/govtech/jina-embeddings-v2-small-en-off-topic) | jina-embeddings-v2-small-en | 0.99 | 0.97 | 0.99 | 0.95 |
| [Fine-tuned cross-encoder classifier](https://huggingface.co/govtech/stsb-roberta-base-off-topic) | stsb-roberta-base | 0.99 | 0.99 | 0.99 | 0.99 |
| Pre-trained cross-encoder | stsb-roberta-base | 0.73 | 0.68 | 0.53 | 0.93 |
| Prompt Engineering | GPT 4o (2024-08-06) | - | 0.95 | 0.94 | 0.97 |
| Prompt Engineering | GPT 4o Mini (2024-07-18) | - | 0.91 | 0.85 | 0.91 |
| Zero-shot Classification | GPT 4o Mini (2024-07-18) | 0.99 | 0.97 | 0.95 | 0.99 |
Further evaluation results on additional synthetic and external datasets (e.g.,`JailbreakBench`, `HarmBench`, `TrustLLM`) are available in our [technical report](https://arxiv.org/abs/2411.12946).
## Usage
1. Clone this repository and install the required dependencies:
```bash
pip install -r requirements.txt
```
2. You can run the model using two options:
**Option 1**: Using `inference_onnx.py` with the ONNX Model.
```
python inference_onnx.py '[
["System prompt example 1", "User prompt example 1"],
["System prompt example 2", "System prompt example 2]
]'
```
**Option 2**: Using `inference_safetensors.py` with PyTorch and SafeTensors.
```
python inference_safetensors.py '[
["System prompt example 1", "User prompt example 1"],
["System prompt example 2", "System prompt example 2]
]'
```
Read more about this model in our [technical report](https://arxiv.org/abs/2411.12946). |