Update README.md
Browse files
README.md
CHANGED
|
@@ -39,11 +39,11 @@ Despite being 1/4 of the size, it is on average only 0.75 points less accurate o
|
|
| 39 |
The input text should be of the format:
|
| 40 |
|
| 41 |
```
|
| 42 |
-
POST: { the context, such as the 'history' column in SHP }
|
| 43 |
|
| 44 |
-
RESPONSE A: { first possible continuation }
|
| 45 |
|
| 46 |
-
RESPONSE B: { second possible continuation }
|
| 47 |
|
| 48 |
Which response is better? RESPONSE
|
| 49 |
```
|
|
@@ -75,9 +75,9 @@ When trying to cram an example into 512 tokens, we recommend truncating the cont
|
|
| 75 |
If you want to use SteamSHP-XL as a reward model -- to get a score for a single response -- then you need to structure the input such that RESPONSE A is what you want to score and RESPONSE B is just an empty input:
|
| 76 |
|
| 77 |
```
|
| 78 |
-
POST: { the context, such as the 'history' column in SHP }
|
| 79 |
|
| 80 |
-
RESPONSE A: { continuation }
|
| 81 |
|
| 82 |
RESPONSE B: .
|
| 83 |
|
|
|
|
| 39 |
The input text should be of the format:
|
| 40 |
|
| 41 |
```
|
| 42 |
+
POST: { the context, such as the 'history' column in SHP (not containing any newlines \n) }
|
| 43 |
|
| 44 |
+
RESPONSE A: { first possible continuation (not containing any newlines \n) }
|
| 45 |
|
| 46 |
+
RESPONSE B: { second possible continuation (not containing any newlines \n) }
|
| 47 |
|
| 48 |
Which response is better? RESPONSE
|
| 49 |
```
|
|
|
|
| 75 |
If you want to use SteamSHP-XL as a reward model -- to get a score for a single response -- then you need to structure the input such that RESPONSE A is what you want to score and RESPONSE B is just an empty input:
|
| 76 |
|
| 77 |
```
|
| 78 |
+
POST: { the context, such as the 'history' column in SHP (not containing any newlines \n) }
|
| 79 |
|
| 80 |
+
RESPONSE A: { continuation (not containing any newlines \n) }
|
| 81 |
|
| 82 |
RESPONSE B: .
|
| 83 |
|