Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2310.01377

Moral Foundations of Large Language Models

Paper • 2310.15337 • Published Oct 23, 2023 • 1
Specific versus General Principles for Constitutional AI

Paper • 2310.13798 • Published Oct 20, 2023 • 3
Contrastive Prefence Learning: Learning from Human Feedback without RL

Paper • 2310.13639 • Published Oct 20, 2023 • 25
RLAIF: Scaling Reinforcement Learning from Human Feedback with AI Feedback

Paper • 2309.00267 • Published Sep 1, 2023 • 47

about 16 hours ago

UltraFeedback: Boosting Language Models with High-quality Feedback

Paper • 2310.01377 • Published Oct 2, 2023 • 5

RLHF-V: Towards Trustworthy MLLMs via Behavior Alignment from Fine-grained Correctional Human Feedback

Paper • 2312.00849 • Published Dec 1, 2023 • 9
RLAIF-V: Aligning MLLMs through Open-Source AI Feedback for Super GPT-4V Trustworthiness

Paper • 2405.17220 • Published May 27, 2024 • 1
ImageReward: Learning and Evaluating Human Preferences for Text-to-Image Generation

Paper • 2304.05977 • Published Apr 12, 2023 • 1
Interpretable Preferences via Multi-Objective Reward Modeling and Mixture-of-Experts

Paper • 2406.12845 • Published Jun 18, 2024 • 1

Synthetic Data Generation

Textbooks Are All You Need

Paper • 2306.11644 • Published Jun 20, 2023 • 142
Textbooks Are All You Need II: phi-1.5 technical report

Paper • 2309.05463 • Published Sep 11, 2023 • 87
TinyStories: How Small Can Language Models Be and Still Speak Coherent English?

Paper • 2305.07759 • Published May 12, 2023 • 33
Scaling Synthetic Data Creation with 1,000,000,000 Personas

Paper • 2406.20094 • Published Jun 28, 2024 • 97

UltraFeedback: Boosting Language Models with High-quality Feedback

Paper • 2310.01377 • Published Oct 2, 2023 • 5
Learn Your Reference Model for Real Good Alignment

Paper • 2404.09656 • Published Apr 15, 2024 • 83
Natural Language Reinforcement Learning

Paper • 2411.14251 • Published Nov 21, 2024 • 28
Group Robust Preference Optimization in Reward-free RLHF

Paper • 2405.20304 • Published May 30, 2024 • 1

Papers - Training - Instruction-Following

Alpaca eval: https://github.com/tatsu-lab/alpaca_eval

UltraFeedback: Boosting Language Models with High-quality Feedback

Paper • 2310.01377 • Published Oct 2, 2023 • 5
AlpacaFarm: A Simulation Framework for Methods that Learn from Human Feedback

Paper • 2305.14387 • Published May 22, 2023 • 1

Papers - Fine-tuning - PPO

HyperCLOVA X Technical Report

Paper • 2404.01954 • Published Apr 2, 2024 • 21
UltraFeedback: Boosting Language Models with High-quality Feedback

Paper • 2310.01377 • Published Oct 2, 2023 • 5
AlpacaFarm: A Simulation Framework for Methods that Learn from Human Feedback

Paper • 2305.14387 • Published May 22, 2023 • 1

Papers - University - Tsinghua University

Condition-Aware Neural Network for Controlled Image Generation

Paper • 2404.01143 • Published Apr 1, 2024 • 13
FlexiDreamer: Single Image-to-3D Generation with FlexiCubes

Paper • 2404.00987 • Published Apr 1, 2024 • 22
Advancing LLM Reasoning Generalists with Preference Trees

Paper • 2404.02078 • Published Apr 2, 2024 • 44
ChatGLM-Math: Improving Math Problem-Solving in Large Language Models with a Self-Critique Pipeline

Paper • 2404.02893 • Published Apr 3, 2024 • 21

Papers - Reward Model - Training

Transforming and Combining Rewards for Aligning Large Language Models

Paper • 2402.00742 • Published Feb 1, 2024 • 12
UltraFeedback: Boosting Language Models with High-quality Feedback

Paper • 2310.01377 • Published Oct 2, 2023 • 5
Learn Your Reference Model for Real Good Alignment

Paper • 2404.09656 • Published Apr 15, 2024 • 83
Prometheus 2: An Open Source Language Model Specialized in Evaluating Other Language Models

Paper • 2405.01535 • Published May 2, 2024 • 121

Papers - Reward Model

Fine-Tuning Language Models from Human Preferences

Paper • 1909.08593 • Published Sep 18, 2019 • 3
Transforming and Combining Rewards for Aligning Large Language Models

Paper • 2402.00742 • Published Feb 1, 2024 • 12
Leverage the Average: an Analysis of KL Regularization in RL

Paper • 2003.14089 • Published Mar 31, 2020 • 2
Direct Preference Optimization of Video Large Multimodal Models from Language Model Reward

Paper • 2404.01258 • Published Apr 1, 2024 • 12

Previous
1
2
Next

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs