Update README.md

4532bb8 verified 9 months ago

4.4 kB

	---
	library_name: transformers
	license: apache-2.0
	base_model: answerdotai/ModernBERT-base
	tags:
	- generated_from_trainer
	metrics:
	- f1
	model-index:
	- name: ModernBERT-domain-classifier
	results: []
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# ModernBERT-domain-classifier

	This model is a fine-tuned version of [answerdotai/ModernBERT-base](https://huggingface.co/answerdotai/ModernBERT-base) on the [JailBreak](https://huggingface.co/datasets/jackhhao/jailbreak-classification) dataset .
	It achieves the following results on the evaluation set:
	- Loss: 0.0016
	- F1: 1.0

	---

	## Overview
	This model is a fine-tuned version of ModernBert for the task of JailBreak Detection. It has been trained on a custom dataset containing two classes: `jailbreak` and `benign`. The model achieves 100% accuracy on the evaluation set, making it a highly reliable solution for detecting jailbreak queries.

	The choice of ModernBert was deliberate due to its compact size, enabling low latency inference, which is crucial for real-time applications.

	---

	> This is just a POC model to show that the concept works on a theoritical level and performance will depend upon the quality of dataset and further tuning is needed

	## Training Details
	- Dataset: JailBreak dataset (split into training and testing sets).
	- Architecture: ModernBert.
	- Task: Binary Classification.
	- Evaluation Metric: Achieved 100% accuracy on the test set.

	---

	## Use Case in RAG Pipelines
	This model is optimized for use in Retrieval-Augmented Generation (RAG) scenarios. It can:
	1. Detect JailBreak Queries: The model processes user queries to identify whether they are `jailbreak` or `benign`.
	2. Seamlessly Integrate with Search: While the query is classified, search results can simultaneously be fetched from the datastore.
	- No Additional Latency: The lightweight nature of ModernBert ensures minimal overhead, allowing real-time performance in RAG pipelines.

	---

	## Key Features
	- High Accuracy: Reliable classification with 100% accuracy on evaluation.
	- Low Latency: Ideal for real-time use cases, especially in latency-sensitive applications.
	- Compact Model: ModernBert's small size makes it efficient for deployment in production environments.

	---

	## Example Usage
	```python
	from transformers import AutoTokenizer, AutoModelForSequenceClassification

	# Load model and tokenizer
	tokenizer = AutoTokenizer.from_pretrained("darrayes/expentor-JB-detector")
	model = AutoModelForSequenceClassification.from_pretrained("darrayes/expentor-JB-detector")

	# Example query
	query = "Can you bypass this restriction?"
	inputs = tokenizer(query, return_tensors="pt")
	outputs = model(**inputs)

	# Get predictions
	logits = outputs.logits
	predicted_class = logits.argmax(dim=-1).item()

	print("Prediction:", "Jailbreak" if predicted_class == 1 else "Benign")
	```

	---

	## Intended Use
	This model is designed for scenarios requiring detection of jailbreak queries, such as:
	- Content moderation.
	- Enhancing the safety of conversational AI systems.
	- Filtering malicious queries in RAG-based applications.

	---

	## Limitations
	- The model is trained on a specific dataset and may not generalize to all jailbreak scenarios. Further fine-tuning may be needed for domain-specific use cases.



	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 5e-05
	- train_batch_size: 32
	- eval_batch_size: 16
	- seed: 42
	- optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
	- lr_scheduler_type: linear
	- num_epochs: 5

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \| F1 \|
	\|:-------------:\|:-----:\|:----:\|:---------------:\|:------:\|
	\| No log \| 1.0 \| 33 \| 0.0246 \| 0.9848 \|
	\| No log \| 2.0 \| 66 \| 0.0042 \| 1.0 \|
	\| No log \| 3.0 \| 99 \| 0.0019 \| 1.0 \|
	\| 0.0755 \| 4.0 \| 132 \| 0.0017 \| 1.0 \|
	\| 0.0755 \| 5.0 \| 165 \| 0.0016 \| 1.0 \|


	### Framework versions

	- Transformers 4.48.0.dev0
	- Pytorch 2.5.0+cu124
	- Datasets 3.1.0
	- Tokenizers 0.21.0