Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2402.08609

Moral Foundations of Large Language Models

Paper • 2310.15337 • Published Oct 23, 2023 • 1
Specific versus General Principles for Constitutional AI

Paper • 2310.13798 • Published Oct 20, 2023 • 3
Contrastive Prefence Learning: Learning from Human Feedback without RL

Paper • 2310.13639 • Published Oct 20, 2023 • 25
RLAIF: Scaling Reinforcement Learning from Human Feedback with AI Feedback

Paper • 2309.00267 • Published Sep 1, 2023 • 47

Model Training - Learning Scheme

Instruction Pre-Training: Language Models are Supervised Multitask Learners

Paper • 2406.14491 • Published Jun 20, 2024 • 88
Pre-training Small Base LMs with Fewer Tokens

Paper • 2404.08634 • Published Apr 12, 2024 • 35
Stacking Your Transformers: A Closer Look at Model Growth for Efficient LLM Pre-Training

Paper • 2405.15319 • Published May 24, 2024 • 26
Can LLMs Learn by Teaching? A Preliminary Study

Paper • 2406.14629 • Published Jun 20, 2024 • 20

Reinforcement Learning (RL / RLHF)

RLHF Workflow: From Reward Modeling to Online RLHF

Paper • 2405.07863 • Published May 13, 2024 • 67
Understanding and Diagnosing Deep Reinforcement Learning

Paper • 2406.16979 • Published Jun 23, 2024 • 9
Direct Nash Optimization: Teaching Language Models to Self-Improve with General Preferences

Paper • 2404.03715 • Published Apr 4, 2024 • 61
Iterative Nash Policy Optimization: Aligning LLMs with General Preferences via No-Regret Learning

Paper • 2407.00617 • Published Jun 30, 2024 • 7

Synthesis Script Generation

Mixtures of Experts Unlock Parameter Scaling for Deep RL

Paper • 2402.08609 • Published Feb 13, 2024 • 35

ORPO: Monolithic Preference Optimization without Reference Model

Paper • 2403.07691 • Published Mar 12, 2024 • 64
sDPO: Don't Use Your Data All at Once

Paper • 2403.19270 • Published Mar 28, 2024 • 41
Teaching Large Language Models to Reason with Reinforcement Learning

Paper • 2403.04642 • Published Mar 7, 2024 • 46
Best Practices and Lessons Learned on Synthetic Data for Language Models

Paper • 2404.07503 • Published Apr 11, 2024 • 30

Mixtures of Experts Unlock Parameter Scaling for Deep RL

Paper • 2402.08609 • Published Feb 13, 2024 • 35

Mixture of Experts

BlackMamba: Mixture of Experts for State-Space Models

Paper • 2402.01771 • Published Feb 1, 2024 • 24
OpenMoE: An Early Effort on Open Mixture-of-Experts Language Models

Paper • 2402.01739 • Published Jan 29, 2024 • 27
MoE-LLaVA: Mixture of Experts for Large Vision-Language Models

Paper • 2401.15947 • Published Jan 29, 2024 • 51
DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models

Paper • 2401.06066 • Published Jan 11, 2024 • 47

Mixtures of Experts Unlock Parameter Scaling for Deep RL

Paper • 2402.08609 • Published Feb 13, 2024 • 35

Real-World Fluid Directed Rigid Body Control via Deep Reinforcement Learning

Paper • 2402.06102 • Published Feb 8, 2024 • 5
Mixtures of Experts Unlock Parameter Scaling for Deep RL

Paper • 2402.08609 • Published Feb 13, 2024 • 35
In deep reinforcement learning, a pruned network is a good network

Paper • 2402.12479 • Published Feb 19, 2024 • 19
Beyond A*: Better Planning with Transformers via Search Dynamics Bootstrapping

Paper • 2402.14083 • Published Feb 21, 2024 • 47

Language Models

Exponentially Faster Language Modelling

Paper • 2311.10770 • Published Nov 15, 2023 • 118
stabilityai/stable-video-diffusion-img2vid-xt

Image-to-Video • Updated Jul 10, 2024 • 389k • 2.86k
LucidDreamer: Domain-free Generation of 3D Gaussian Splatting Scenes

Paper • 2311.13384 • Published Nov 22, 2023 • 51
HierSpeech++: Bridging the Gap between Semantic and Acoustic Representation of Speech by Hierarchical Variational Inference for Zero-shot Speech Synthesis

Paper • 2311.12454 • Published Nov 21, 2023 • 30

Previous
1
2
Next

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs