inference optimization - a zzfive Collection

zzfive 's Collections

ssm

safety

inference optimization

RL+reason model

medical

3d

image

LLMs

video

agent

cv

audio

robot

inference optimization

updated 12 days ago

Low-Rank Adapters Meet Neural Architecture Search for LLM Compression

Paper • 2501.16372 • Published Jan 23 • 9
TAID: Temporally Adaptive Interpolated Distillation for Efficient Knowledge Transfer in Language Models

Paper • 2501.16937 • Published Jan 28 • 6
Matryoshka Quantization

Paper • 2502.06786 • Published Feb 10 • 30
Identifying Sensitive Weights via Post-quantization Integral

Paper • 2503.01901 • Published 25 days ago • 7
EAGLE-3: Scaling up Inference Acceleration of Large Language Models via Training-Time Test

Paper • 2503.01840 • Published 22 days ago • 4
SEAP: Training-free Sparse Expert Activation Pruning Unlock the Brainpower of Large Language Models

Paper • 2503.07605 • Published 15 days ago • 65
DistiLLM-2: A Contrastive Approach Boosts the Distillation of LLMs

Paper • 2503.07067 • Published 15 days ago • 29
Efficient Distillation of Classifier-Free Guidance using Adapters

Paper • 2503.07274 • Published 15 days ago • 4
RayFlow: Instance-Aware Diffusion Acceleration via Adaptive Flow Trajectories

Paper • 2503.07699 • Published 15 days ago • 5