Safetensors
English
gpt2
Not-For-All-Audiences
leondz commited on
Commit
91e4e49
·
verified ·
1 Parent(s): 3a48eee

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -5
README.md CHANGED
@@ -10,8 +10,8 @@ tags:
10
  - not-for-all-audiences
11
  ---
12
 
13
- **This model has a propensity to produce highly unsavoury content from the outset.
14
- It is not intended or suitable for general use.**
15
 
16
  This special-use model aims to provide prompts that goad LLMs into producting "toxicity".
17
  Toxicity here is defined by the content of the [Civil Comments](https://medium.com/@aja_15265/saying-goodbye-to-civil-comments-41859d3a2b1d) dataset, containing
@@ -31,7 +31,4 @@ These prompt-response pairs are taken from the Anthropic HHRLHF corpus ([paper](
31
  filtered to those exchanges in which the model produced "toxicity" as defined above,
32
  using the [martin-ha/toxic-comment-model](https://huggingface.co/martin-ha/toxic-comment-model) DistilBERT classifier based on that data.
33
 
34
- **This model has a propensity to produce highly unsavoury content from the outset.
35
- It is not intended or suitable for general use.**
36
-
37
  See https://interhumanagreement.substack.com/p/faketoxicityprompts-automatic-red for details on the training process.
 
10
  - not-for-all-audiences
11
  ---
12
 
13
+ **This adversarial model has a propensity to produce highly unsavoury content from the outset.
14
+ It is not intended or suitable for general use or human consumption.**
15
 
16
  This special-use model aims to provide prompts that goad LLMs into producting "toxicity".
17
  Toxicity here is defined by the content of the [Civil Comments](https://medium.com/@aja_15265/saying-goodbye-to-civil-comments-41859d3a2b1d) dataset, containing
 
31
  filtered to those exchanges in which the model produced "toxicity" as defined above,
32
  using the [martin-ha/toxic-comment-model](https://huggingface.co/martin-ha/toxic-comment-model) DistilBERT classifier based on that data.
33
 
 
 
 
34
  See https://interhumanagreement.substack.com/p/faketoxicityprompts-automatic-red for details on the training process.