Papers
arxiv:2502.01667

Refining Alignment Framework for Diffusion Models with Intermediate-Step Preference Ranking

Published on Feb 1
Authors:
,
,
,
,

Abstract

Direct preference optimization (DPO) has shown success in aligning diffusion models with human preference. Previous approaches typically assume a consistent preference label between final generations and noisy samples at intermediate steps, and directly apply DPO to these noisy samples for fine-tuning. However, we theoretically identify inherent issues in this assumption and its impacts on the effectiveness of preference alignment. We first demonstrate the inherent issues from two perspectives: gradient direction and preference order, and then propose a Tailored Preference Optimization (TailorPO) framework for aligning diffusion models with human preference, underpinned by some theoretical insights. Our approach directly ranks intermediate noisy samples based on their step-wise reward, and effectively resolves the gradient direction issues through a simple yet efficient design. Additionally, we incorporate the gradient guidance of diffusion models into preference alignment to further enhance the optimization effectiveness. Experimental results demonstrate that our method significantly improves the model's ability to generate aesthetically pleasing and human-preferred images.

Community

Your need to confirm your account before you can post a new comment.

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2502.01667 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2502.01667 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2502.01667 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.