🧠Reasoning datasets Collection Datasets with reasoning traces for math and code released by the community • 14 items • Updated 3 days ago • 101
LServe: Efficient Long-sequence LLM Serving with Unified Sparse Attention Paper • 2502.14866 • Published 22 days ago • 12
DuoAttention: Efficient Long-Context LLM Inference with Retrieval and Streaming Heads Paper • 2410.10819 • Published Oct 14, 2024 • 7
DuoAttention: Efficient Long-Context LLM Inference with Retrieval and Streaming Heads Paper • 2410.10819 • Published Oct 14, 2024 • 7
DuoAttention: Efficient Long-Context LLM Inference with Retrieval and Streaming Heads Paper • 2410.10819 • Published Oct 14, 2024 • 7 • 2
Quest: Query-Aware Sparsity for Efficient Long-Context LLM Inference Paper • 2406.10774 • Published Jun 16, 2024 • 3
QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving Paper • 2405.04532 • Published May 7, 2024
Retrieval Head Mechanistically Explains Long-Context Factuality Paper • 2404.15574 • Published Apr 24, 2024 • 3
InfLLM: Unveiling the Intrinsic Capacity of LLMs for Understanding Extremely Long Sequences with Training-Free Memory Paper • 2402.04617 • Published Feb 7, 2024 • 4
BitDelta: Your Fine-Tune May Only Be Worth One Bit Paper • 2402.10193 • Published Feb 15, 2024 • 22
SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models Paper • 2211.10438 • Published Nov 18, 2022 • 4