qgyd2021 commited on
Commit
e027c52
·
1 Parent(s): 455e6d5

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -1
README.md CHANGED
@@ -15,10 +15,11 @@ fine-tuned [GPT2](https://huggingface.co/gpt2) to a reward model.
15
 
16
  The model is designed to generate human-like responses to questions in [Stack Exchange](https://huggingface.co/datasets/lvwerra/stack-exchange-paired) domains of programming, mathematics, physics, and more.
17
 
18
- For more info check out the blog post and github [example](https://github.com/huggingface/trl/tree/main/examples/research_projects/stack_llama_2/scripts).
19
 
20
  info:
21
  * epoch: 1.0
22
  * train_loss: 0.641692199903866
23
  * eval_loss: 0.6299035549163818
24
  * eval_accuracy: 0.729
 
 
15
 
16
  The model is designed to generate human-like responses to questions in [Stack Exchange](https://huggingface.co/datasets/lvwerra/stack-exchange-paired) domains of programming, mathematics, physics, and more.
17
 
18
+ For training code check the github [example](https://github.com/huggingface/trl/blob/main/examples/research_projects/stack_llama/scripts/reward_modeling.py).
19
 
20
  info:
21
  * epoch: 1.0
22
  * train_loss: 0.641692199903866
23
  * eval_loss: 0.6299035549163818
24
  * eval_accuracy: 0.729
25
+