Safetensors
English
distilbert
leondz commited on
Commit
906ac60
·
verified ·
1 Parent(s): 5bdc57e

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +15 -3
README.md CHANGED
@@ -1,3 +1,15 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ language:
4
+ - en
5
+ ---
6
+
7
+ This classifier is trained on LLM responses designed to capture a model refuting a false claim.
8
+
9
+ They're prompt results of OpenAI gpt-3.5-turbo run on June 1 2023.
10
+ Prompts are constructed by prepending "Explain why" to claims made in the True-False dataset provided alongside [The Internal State of an LLM Knows When its Lying](https://arxiv.org/abs/2304.13734).
11
+ Only the first sentence of the response is included (from nltk.sent_tokenize).
12
+ The original labels are used, where 0 corresponds to a false claim.
13
+ That is, the 0 labels should be almost all refutations.
14
+ Spurious and missing refutations were removed by hand, corresponding to about 10% of the data.
15
+ This were caused by either an incorrect model assertion, or errors in the source data.