-
Secrets of RLHF in Large Language Models Part I: PPO
Paper • 2307.04964 • Published • 29 -
Safe RLHF: Safe Reinforcement Learning from Human Feedback
Paper • 2310.12773 • Published • 28 -
Stabilizing RLHF through Advantage Model and Selective Rehearsal
Paper • 2309.10202 • Published • 11 -
Pairwise Proximal Policy Optimization: Harnessing Relative Feedback for LLM Alignment
Paper • 2310.00212 • Published • 2
Collections
Discover the best community collections!
Collections including paper arxiv:2307.04964
-
Attention Is All You Need
Paper • 1706.03762 • Published • 55 -
Language Models are Few-Shot Learners
Paper • 2005.14165 • Published • 13 -
Learning to summarize from human feedback
Paper • 2009.01325 • Published • 4 -
Training language models to follow instructions with human feedback
Paper • 2203.02155 • Published • 17
-
Llama 2: Open Foundation and Fine-Tuned Chat Models
Paper • 2307.09288 • Published • 244 -
Large-Scale Automatic Audiobook Creation
Paper • 2309.03926 • Published • 54 -
From Sparse to Dense: GPT-4 Summarization with Chain of Density Prompting
Paper • 2309.04269 • Published • 33 -
Textbooks Are All You Need II: phi-1.5 technical report
Paper • 2309.05463 • Published • 87