File size: 1,399 Bytes
f9cdb14
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
---
license: apache-2.0
language:
- en
base_model:
- FacebookAI/roberta-base
pipeline_tag: text-classification
library_name: transformers
---

# RoBERTa-base AI Text Detector

Finetuned RoBERTa-base model for detecting AI generated English texts.

See [FakespotAILabs/ApolloDFT](https://github.com/FakespotAILabs/ApolloDFT) for more details and a technical report of the model and experiments we conducted.

## How to use

You can use this model directly with a pipeline.

For better performance, you should apply the `clean_text` function in [utils.py](utils.py).

```python
from transformers import pipeline
from utils import clean_text

classifier = pipeline(
    "text-classification",
    model="fakespot-ai/roberta-base-ai-text-detection-v1"
)

# single text
text = "text 1"
classifier(clean_text(text))
[   
    {
        'label': str,
        'score': float
    }
]

# list of texts
texts = ["text 1", "text 2"]
classifier([clean_text(t) for t in texts])
[   
    {
        'label': str,
        'score': float
    },
    {
        'label': str,
        'score': float
    }
]
```

## Disclaimer

- The model's score represents an estimation of the likelihood of the input text being AI-generated or human-written, rather than indicating the proportion of the text that is AI-generated or human-written.
- The accuracy and performance of the model generally improve with longer text inputs.