Transformers
PyTorch
English
llama
reward model
RLHF
RLAIF
text-generation-inference
banghua's picture
Duplicate from banghua/n_rm
6f8f5dc
raw
history blame contribute delete
15 Bytes
global_step1400