ReAGent: Towards A Model-agnostic Feature Attribution Method for Generative Language Models Paper • 2402.00794 • Published Feb 1, 2024 • 1
Rethinking Interpretability in the Era of Large Language Models Paper • 2402.01761 • Published Jan 30, 2024 • 23
Analyze Feature Flow to Enhance Interpretation and Steering in Language Models Paper • 2502.03032 • Published 7 days ago • 53