-
Hydragen: High-Throughput LLM Inference with Shared Prefixes
Paper • 2402.05099 • Published • 20 -
Ouroboros: Speculative Decoding with Large Model Enhanced Drafting
Paper • 2402.13720 • Published • 7 -
Reducing Transformer Key-Value Cache Size with Cross-Layer Attention
Paper • 2405.12981 • Published • 32 -
Your Transformer is Secretly Linear
Paper • 2405.12250 • Published • 153
Collections
Discover the best community collections!
Collections including paper arxiv:2405.14860
-
PRDP: Proximal Reward Difference Prediction for Large-Scale Reward Finetuning of Diffusion Models
Paper • 2402.08714 • Published • 14 -
Data Engineering for Scaling Language Models to 128K Context
Paper • 2402.10171 • Published • 25 -
RLVF: Learning from Verbal Feedback without Overgeneralization
Paper • 2402.10893 • Published • 12 -
Coercing LLMs to do and reveal (almost) anything
Paper • 2402.14020 • Published • 13
-
A Close Look at Decomposition-based XAI-Methods for Transformer Language Models
Paper • 2502.15886 • Published • 1 -
We Can't Understand AI Using our Existing Vocabulary
Paper • 2502.07586 • Published • 10 -
Position-aware Automatic Circuit Discovery
Paper • 2502.04577 • Published • 1 -
Building Bridges, Not Walls -- Advancing Interpretability by Unifying Feature, Data, and Model Component Attribution
Paper • 2501.18887 • Published • 1