darrayes commited on
Commit
b0ad9d1
·
verified ·
1 Parent(s): e956720

Update README.md

Browse files

Changed the model card.

Files changed (1) hide show
  1. README.md +64 -7
README.md CHANGED
@@ -16,22 +16,79 @@ should probably proofread and complete it, then remove this comment. -->
16
 
17
  # ModernBERT-domain-classifier
18
 
19
- This model is a fine-tuned version of [answerdotai/ModernBERT-base](https://huggingface.co/answerdotai/ModernBERT-base) on an unknown dataset.
20
  It achieves the following results on the evaluation set:
21
  - Loss: 0.0016
22
  - F1: 1.0
23
 
24
- ## Model description
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
25
 
26
- More information needed
 
 
 
27
 
28
- ## Intended uses & limitations
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
29
 
30
- More information needed
 
31
 
32
- ## Training and evaluation data
33
 
34
- More information needed
35
 
36
  ## Training procedure
37
 
 
16
 
17
  # ModernBERT-domain-classifier
18
 
19
+ This model is a fine-tuned version of [answerdotai/ModernBERT-base](https://huggingface.co/answerdotai/ModernBERT-base) on the [JailBreak](https://huggingface.co/datasets/jackhhao/jailbreak-classification) dataset .
20
  It achieves the following results on the evaluation set:
21
  - Loss: 0.0016
22
  - F1: 1.0
23
 
24
+ ---
25
+
26
+ ## Overview
27
+ This model is a fine-tuned version of **ModernBert** for the task of **JailBreak Detection**. It has been trained on a custom dataset containing two classes: `jailbreak` and `benign`. The model achieves **100% accuracy** on the evaluation set, making it a highly reliable solution for detecting jailbreak queries.
28
+
29
+ The choice of ModernBert was deliberate due to its compact size, enabling **low latency inference**, which is crucial for real-time applications.
30
+
31
+ ---
32
+
33
+ > This is just a POC model to show that the concept works on a theoritical level and performance will depend upon the quality of dataset and further tuning is needed
34
+
35
+ ## Training Details
36
+ - **Dataset**: JailBreak dataset (split into training and testing sets).
37
+ - **Architecture**: ModernBert.
38
+ - **Task**: Binary Classification.
39
+ - **Evaluation Metric**: Achieved **100% accuracy** on the test set.
40
+
41
+ ---
42
+
43
+ ## Use Case in RAG Pipelines
44
+ This model is optimized for use in **Retrieval-Augmented Generation (RAG)** scenarios. It can:
45
+ 1. **Detect JailBreak Queries**: The model processes user queries to identify whether they are `jailbreak` or `benign`.
46
+ 2. **Seamlessly Integrate with Search**: While the query is classified, search results can simultaneously be fetched from the datastore.
47
+ - **No Additional Latency**: The lightweight nature of ModernBert ensures minimal overhead, allowing real-time performance in RAG pipelines.
48
+
49
+ ---
50
 
51
+ ## Key Features
52
+ - **High Accuracy**: Reliable classification with 100% accuracy on evaluation.
53
+ - **Low Latency**: Ideal for real-time use cases, especially in latency-sensitive applications.
54
+ - **Compact Model**: ModernBert's small size makes it efficient for deployment in production environments.
55
 
56
+ ---
57
+
58
+ ## Example Usage
59
+ ```python
60
+ from transformers import AutoTokenizer, AutoModelForSequenceClassification
61
+
62
+ # Load model and tokenizer
63
+ tokenizer = AutoTokenizer.from_pretrained("your-username/jailbreak-detection-model")
64
+ model = AutoModelForSequenceClassification.from_pretrained("your-username/jailbreak-detection-model")
65
+
66
+ # Example query
67
+ query = "Can you bypass this restriction?"
68
+ inputs = tokenizer(query, return_tensors="pt")
69
+ outputs = model(**inputs)
70
+
71
+ # Get predictions
72
+ logits = outputs.logits
73
+ predicted_class = logits.argmax(dim=-1).item()
74
+
75
+ print("Prediction:", "Jailbreak" if predicted_class == 1 else "Benign")
76
+ ```
77
+
78
+ ---
79
+
80
+ ## Intended Use
81
+ This model is designed for scenarios requiring detection of jailbreak queries, such as:
82
+ - Content moderation.
83
+ - Enhancing the safety of conversational AI systems.
84
+ - Filtering malicious queries in RAG-based applications.
85
+
86
+ ---
87
 
88
+ ## Limitations
89
+ - The model is trained on a specific dataset and may not generalize to all jailbreak scenarios. Further fine-tuning may be needed for domain-specific use cases.
90
 
 
91
 
 
92
 
93
  ## Training procedure
94