Update README.md

Files changed (1) hide show

README.md CHANGED Viewed

@@ -37,7 +37,7 @@ The total number of the comparison pairs is 250K, where we perform the following
 ### Training
-We train the model for one epoch with a learning rate of 5e-6, batch size 256, cosine learning rate decay with a warmup ratio 0.03. You can see my training script here: https://github.com/WeiXiongUST/RAFT-Reward-Ranked-Finetuning/blob/main/reward_modeling.py


37
38	### Training
39
40	+ We train the model for one epoch with a learning rate of 5e-6, batch size 256, cosine learning rate decay with a warmup ratio 0.03. You can see my training script here: https://github.com/WeiXiongUST/RAFT-Reward-Ranked-Finetuning/blob/main/reward_modeling.py , which is modified from the TRL package.
41
42
43