Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2307.04964

Secrets of RLHF in Large Language Models Part I: PPO

Paper • 2307.04964 • Published Jul 11, 2023 • 29
Safe RLHF: Safe Reinforcement Learning from Human Feedback

Paper • 2310.12773 • Published Oct 19, 2023 • 28
Stabilizing RLHF through Advantage Model and Selective Rehearsal

Paper • 2309.10202 • Published Sep 18, 2023 • 11
Pairwise Proximal Policy Optimization: Harnessing Relative Feedback for LLM Alignment

Paper • 2310.00212 • Published Sep 30, 2023 • 2

Attention Is All You Need

Paper • 1706.03762 • Published Jun 12, 2017 • 55
Language Models are Few-Shot Learners

Paper • 2005.14165 • Published May 28, 2020 • 13
Learning to summarize from human feedback

Paper • 2009.01325 • Published Sep 2, 2020 • 4
Training language models to follow instructions with human feedback

Paper • 2203.02155 • Published Mar 4, 2022 • 17

catchup readings

Llama 2: Open Foundation and Fine-Tuned Chat Models

Paper • 2307.09288 • Published Jul 18, 2023 • 244
Large-Scale Automatic Audiobook Creation

Paper • 2309.03926 • Published Sep 7, 2023 • 54
From Sparse to Dense: GPT-4 Summarization with Chain of Density Prompting

Paper • 2309.04269 • Published Sep 8, 2023 • 33
Textbooks Are All You Need II: phi-1.5 technical report

Paper • 2309.05463 • Published Sep 11, 2023 • 87

Previous
1
2
Next

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs