AI Detect Model

Model Description

The AI Detect Model is a binary classification model designed to determine whether a given text is AI-generated (label=1) or written by a human (label=0). This model plays a crucial role in providing AI detection rewards, helping to prevent reward hacking during Reinforcement Learning with Cycle Consistency (RLCC). For more details, please refer to our paper.

This model is built upon the Longformer architecture and trained using our proprietary LMSYS-USP dataset. Specifically, in a dialogue context, texts generated by the assistant are labeled as AI-generated (label=1), while user-generated texts are assigned the opposite label (label=0).

Note: Our model is subject to the following constraints:

Maximum Context Length: Supports up to 4,096 tokens. Exceeding this may degrade performance; keep inputs within this limit for best results.

Language Limitation: Optimized for English. Non-English performance may vary due to limited training data.

Quick Start

You can utilize our AI detection model as demonstrated below:

from transformers import LongformerTokenizer, LongformerForSequenceClassification
import torch
import torch.nn.functional as F

class AIDetector:
    def __init__(self, model_name="allenai/longformer-base-4096", max_length=4096):
        """
        Initialize the AIDetector with a pretrained Longformer model and tokenizer.

        Args:
            model_name (str): The name or path of the pretrained Longformer model.
            max_length (int): The maximum sequence length for tokenization.
        """
        self.tokenizer = LongformerTokenizer.from_pretrained(model_name)
        self.model = LongformerForSequenceClassification.from_pretrained(model_name)
        self.model.eval()
        self.max_length = max_length
        self.tokenizer.padding_side = "right"

    @torch.no_grad()
    def get_probability(self, texts):
        inputs = self.tokenizer(texts, padding=True, truncation=True, max_length=self.max_length, return_tensors='pt')
        outputs = self.model(**inputs)
        probabilities = F.softmax(outputs.logits, dim=1)
        return probabilities

# Example usage
if __name__ == "__main__":
    """
    Demonstrate the usage of AIDetector to classify whether given texts are AI-generated.
    """
    # Initialize the detector with a custom model path
    classifier = AIDetector(model_name="/path/to/ai_detector")
    
    # Define sample texts for classification
    target_text = [
        "I am thinking about going away for vacation",
        "How can I help you today?"
    ]
    
    # Get classification probabilities
    result = classifier.get_probability(target_text)
    
    # Print results
    print("Classification Probabilities:", result)

Citation

If you find this model useful, please cite:

[Authors], "[Paper Title]," [Venue], [Year], [URL or DOI].