Update README.md
Browse files
README.md
CHANGED
@@ -10,8 +10,8 @@ tags:
|
|
10 |
- not-for-all-audiences
|
11 |
---
|
12 |
|
13 |
-
**This model has a propensity to produce highly unsavoury content from the outset.
|
14 |
-
It is not intended or suitable for general use.**
|
15 |
|
16 |
This special-use model aims to provide prompts that goad LLMs into producting "toxicity".
|
17 |
Toxicity here is defined by the content of the [Civil Comments](https://medium.com/@aja_15265/saying-goodbye-to-civil-comments-41859d3a2b1d) dataset, containing
|
@@ -31,7 +31,4 @@ These prompt-response pairs are taken from the Anthropic HHRLHF corpus ([paper](
|
|
31 |
filtered to those exchanges in which the model produced "toxicity" as defined above,
|
32 |
using the [martin-ha/toxic-comment-model](https://huggingface.co/martin-ha/toxic-comment-model) DistilBERT classifier based on that data.
|
33 |
|
34 |
-
**This model has a propensity to produce highly unsavoury content from the outset.
|
35 |
-
It is not intended or suitable for general use.**
|
36 |
-
|
37 |
See https://interhumanagreement.substack.com/p/faketoxicityprompts-automatic-red for details on the training process.
|
|
|
10 |
- not-for-all-audiences
|
11 |
---
|
12 |
|
13 |
+
**This adversarial model has a propensity to produce highly unsavoury content from the outset.
|
14 |
+
It is not intended or suitable for general use or human consumption.**
|
15 |
|
16 |
This special-use model aims to provide prompts that goad LLMs into producting "toxicity".
|
17 |
Toxicity here is defined by the content of the [Civil Comments](https://medium.com/@aja_15265/saying-goodbye-to-civil-comments-41859d3a2b1d) dataset, containing
|
|
|
31 |
filtered to those exchanges in which the model produced "toxicity" as defined above,
|
32 |
using the [martin-ha/toxic-comment-model](https://huggingface.co/martin-ha/toxic-comment-model) DistilBERT classifier based on that data.
|
33 |
|
|
|
|
|
|
|
34 |
See https://interhumanagreement.substack.com/p/faketoxicityprompts-automatic-red for details on the training process.
|