-
BiTA: Bi-Directional Tuning for Lossless Acceleration in Large Language Models
Paper • 2401.12522 • Published • 12 -
Hydragen: High-Throughput LLM Inference with Shared Prefixes
Paper • 2402.05099 • Published • 20 -
BiLLM: Pushing the Limit of Post-Training Quantization for LLMs
Paper • 2402.04291 • Published • 49 -
Shortened LLaMA: A Simple Depth Pruning for Large Language Models
Paper • 2402.02834 • Published • 16
Collections
Discover the best community collections!
Collections including paper arxiv:2410.00531