arxiv:2502.09604

SelfCite: Self-Supervised Alignment for Context Attribution in Large Language Models

Published on Feb 13

· Submitted by

voidism on Feb 14

Upvote

Authors:

Yung-Sung Chuang ,

Benjamin Cohen-Wang ,

Zhaofeng Wu ,

Abstract

We introduce SelfCite, a novel self-supervised approach that aligns LLMs to generate high-quality, fine-grained, sentence-level citations for the statements in their generated responses. Instead of only relying on costly and labor-intensive annotations, SelfCite leverages a reward signal provided by the LLM itself through context ablation: If a citation is necessary, removing the cited text from the context should prevent the same response; if sufficient, retaining the cited text alone should preserve the same response. This reward can guide the inference-time best-of-N sampling strategy to improve citation quality significantly, as well as be used in preference optimization to directly fine-tune the models for generating better citations. The effectiveness of SelfCite is demonstrated by increasing citation F1 up to 5.3 points on the LongBench-Cite benchmark across five long-form question answering tasks.

View arXiv page View PDF Add to collection

Community

voidism

Paper author Paper submitter 2 days ago

•

edited 2 days ago

We designed a self-supervised reward to align LLMs for generating better citations to attribute the context when answering to questions, without human supervision.
SelfCite leverages a reward signal provided by the LLM itself through context ablation
- If a citation is necessary, removing the cited text should prevent the same response
- if sufficient, retaining the cited text alone should preserve the same response
This reward can guide 1) best-of-N sampling 2) fine-tuning with SimPO for generating better citations.
Increasing citation F1 up to 5.3 points on the LongBench-Cite
The result is comparable to the specialized commercial API of "Claude Citations" released in Jan 2025.