Collections
Discover the best community collections!
Collections including paper arxiv:2404.01954
-
HyperCLOVA X Technical Report
Paper • 2404.01954 • Published • 23 -
UltraFeedback: Boosting Language Models with High-quality Feedback
Paper • 2310.01377 • Published • 5 -
AlpacaFarm: A Simulation Framework for Methods that Learn from Human Feedback
Paper • 2305.14387 • Published • 1 -
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models
Paper • 2402.03300 • Published • 107
-
Direct Preference Optimization: Your Language Model is Secretly a Reward Model
Paper • 2305.18290 • Published • 53 -
HyperCLOVA X Technical Report
Paper • 2404.01954 • Published • 23 -
Tango 2: Aligning Diffusion-based Text-to-Audio Generations through Direct Preference Optimization
Paper • 2404.09956 • Published • 12 -
Learn Your Reference Model for Real Good Alignment
Paper • 2404.09656 • Published • 84