stanfordnlp
/

SteamSHP-flan-t5-xl

text2text-generation

preference model

text-generation-inference

Model card Files Files and versions

kawine commited on Feb 25, 2023

Commit

c3fd975

·

1 Parent(s): f7b2171

Update README.md

Files changed (1) hide show

README.md +5 -5

README.md CHANGED Viewed

@@ -39,11 +39,11 @@ Despite being 1/4 of the size, it is on average only 0.75 points less accurate o
 The input text should be of the format:
 ```
-POST: { the context, such as the 'history' column in SHP }
-RESPONSE A: { first possible continuation }
-RESPONSE B: { second possible continuation }
 Which response is better? RESPONSE
 ```
@@ -75,9 +75,9 @@ When trying to cram an example into 512 tokens, we recommend truncating the cont
 If you want to use SteamSHP-XL as a reward model -- to get a score for a single response -- then you need to structure the input such that RESPONSE A is what you want to score and RESPONSE B is just an empty input:
 ```
-POST: { the context, such as the 'history' column in SHP }
-RESPONSE A: { continuation }
 RESPONSE B: .

 The input text should be of the format:
 ```
+POST: { the context, such as the 'history' column in SHP (not containing any newlines \n) }
+RESPONSE A: { first possible continuation (not containing any newlines \n) }
+RESPONSE B: { second possible continuation (not containing any newlines \n) }
 Which response is better? RESPONSE
 ```
 If you want to use SteamSHP-XL as a reward model -- to get a score for a single response -- then you need to structure the input such that RESPONSE A is what you want to score and RESPONSE B is just an empty input:
 ```
+POST: { the context, such as the 'history' column in SHP (not containing any newlines \n) }
+RESPONSE A: { continuation (not containing any newlines \n) }
 RESPONSE B: .