Update README.md
Browse files
README.md
CHANGED
@@ -37,7 +37,7 @@ The total number of the comparison pairs is 250K, where we perform the following
|
|
37 |
|
38 |
### Training
|
39 |
|
40 |
-
We train the model for one epoch with a learning rate of 5e-6, batch size 256, cosine learning rate decay with a warmup ratio 0.03. You can see my training script here: https://github.com/WeiXiongUST/RAFT-Reward-Ranked-Finetuning/blob/main/reward_modeling.py
|
41 |
|
42 |
|
43 |
|
|
|
37 |
|
38 |
### Training
|
39 |
|
40 |
+
We train the model for one epoch with a learning rate of 5e-6, batch size 256, cosine learning rate decay with a warmup ratio 0.03. You can see my training script here: https://github.com/WeiXiongUST/RAFT-Reward-Ranked-Finetuning/blob/main/reward_modeling.py , which is modified from the TRL package.
|
41 |
|
42 |
|
43 |
|