Low-Rank Adapters Meet Neural Architecture Search for LLM Compression Paper • 2501.16372 • Published Jan 23 • 9
TAID: Temporally Adaptive Interpolated Distillation for Efficient Knowledge Transfer in Language Models Paper • 2501.16937 • Published Jan 28 • 6
Identifying Sensitive Weights via Post-quantization Integral Paper • 2503.01901 • Published 25 days ago • 7
EAGLE-3: Scaling up Inference Acceleration of Large Language Models via Training-Time Test Paper • 2503.01840 • Published 22 days ago • 4
SEAP: Training-free Sparse Expert Activation Pruning Unlock the Brainpower of Large Language Models Paper • 2503.07605 • Published 15 days ago • 65
DistiLLM-2: A Contrastive Approach Boosts the Distillation of LLMs Paper • 2503.07067 • Published 15 days ago • 29
Efficient Distillation of Classifier-Free Guidance using Adapters Paper • 2503.07274 • Published 15 days ago • 4
RayFlow: Instance-Aware Diffusion Acceleration via Adaptive Flow Trajectories Paper • 2503.07699 • Published 15 days ago • 5