ONNX
English
File size: 3,008 Bytes
505c86b
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1d52dd2
 
 
 
 
 
 
505c86b
 
1d52dd2
 
 
 
d9707d3
 
1d52dd2
 
d9707d3
 
1d52dd2
d9707d3
 
 
 
 
1d52dd2
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
---
license: other
license_name: govtech-singapore
license_link: LICENSE
datasets:
- gabrielchua/off-topic
language:
- en
metrics:
- roc_auc
- f1
- precision
- recall
base_model:
- cross-encoder/stsb-roberta-base
---

# Off-Topic Classification Model

This model leverages a fine-tuned **Cross Encoder STSB Roberta Base** to perform binary classification, determining whether a user prompt is off-topic in relation to the system's intended purpose as defined by the system prompt.

## Model Highlights

- **Base Model**: [`stsb-roberta-base`](https://huggingface.co/cross-encoder/stsb-roberta-base)
- **Maximum Context Length**: 514 tokens
- **Task**: Binary classification (on-topic/off-topic)

## Performance

We evaluated our fine-tuned models on synthetic data modelling system and user prompt pairs reflecting real world enterprise use cases of LLMs. The dataset is available [here](https://huggingface.co/datasets/gabrielchua/off-topic).

| Approach                              | Model                          | ROC-AUC | F1   | Precision | Recall |
|---------------------------------------|--------------------------------|---------|------|-----------|--------|
| 👉 [Fine-tuned bi-encoder classifier](https://huggingface.co/govtech/jina-embeddings-v2-small-en-off-topic)      | jina-embeddings-v2-small-en    | 0.99    | 0.97 | 0.99      | 0.95   |
| [Fine-tuned cross-encoder classifier](https://huggingface.co/govtech/stsb-roberta-base-off-topic)   | stsb-roberta-base              | 0.99    | 0.99 | 0.99      | 0.99   |
| Pre-trained cross-encoder             | stsb-roberta-base              | 0.73    | 0.68 | 0.53      | 0.93   |
| Prompt Engineering             | GPT 4o (2024-08-06)              | -    | 0.95 | 0.94      | 0.97   |
| Prompt Engineering             | GPT 4o Mini (2024-07-18)              | -    | 0.91 | 0.85      | 0.91   |
| Zero-shot Classification             | GPT 4o Mini (2024-07-18)              | 0.99    | 0.97 | 0.95      | 0.99   |

Further evaluation results on additional synthetic and external datasets (e.g.,`JailbreakBench`, `HarmBench`, `TrustLLM`) are available in our [technical report](https://arxiv.org/abs/2411.12946).

## Usage
1. Clone this repository and install the required dependencies:

    ```bash
    pip install -r requirements.txt
    ```

2. You can run the model using two options:

    **Option 1**: Using `inference_onnx.py` with the ONNX Model.

        ```
        python inference_onnx.py '[
            ["System prompt example 1", "User prompt example 1"],
            ["System prompt example 2", "System prompt example 2]
        ]'
        ```

    **Option 2**: Using `inference_safetensors.py` with PyTorch and SafeTensors.

        ```
        python inference_safetensors.py '[
            ["System prompt example 1", "User prompt example 1"],
            ["System prompt example 2", "System prompt example 2]
        ]'
        ```

Read more about this model in our [technical report](https://arxiv.org/abs/2411.12946).