Cost-Optimal Grouped-Query Attention for Long-Context LLMs Paper • 2503.09579 • Published about 19 hours ago • 2
Cost-Optimal Grouped-Query Attention for Long-Context LLMs Paper • 2503.09579 • Published about 19 hours ago • 2
Cost-Optimal Grouped-Query Attention for Long-Context LLMs Paper • 2503.09579 • Published about 19 hours ago • 2 • 1
MARS: Unleashing the Power of Variance Reduction for Training Large Models Paper • 2411.10438 • Published Nov 15, 2024 • 13
Sparsing Law: Towards Large Language Models with Greater Activation Sparsity Paper • 2411.02335 • Published Nov 4, 2024 • 11
Stuffed Mamba: State Collapse and State Capacity of RNN-Based Long-Context Modeling Paper • 2410.07145 • Published Oct 9, 2024 • 2
Stuffed Mamba: State Collapse and State Capacity of RNN-Based Long-Context Modeling Paper • 2410.07145 • Published Oct 9, 2024 • 2
Stuffed Mamba: State Collapse and State Capacity of RNN-Based Long-Context Modeling Paper • 2410.07145 • Published Oct 9, 2024 • 2 • 3
view article Article A failed experiment: Infini-Attention, and why we should keep trying? Aug 14, 2024 • 60
CFDBench: A Large-Scale Benchmark for Machine Learning Methods in Fluid Dynamics Paper • 2310.05963 • Published Sep 13, 2023
Robust and Scalable Model Editing for Large Language Models Paper • 2403.17431 • Published Mar 26, 2024
$\infty$Bench: Extending Long Context Evaluation Beyond 100K Tokens Paper • 2402.13718 • Published Feb 21, 2024 • 1
Sub-Character Tokenization for Chinese Pretrained Language Models Paper • 2106.00400 • Published Jun 1, 2021
Beyond the Turn-Based Game: Enabling Real-Time Conversations with Duplex Models Paper • 2406.15718 • Published Jun 22, 2024 • 14
Beyond the Turn-Based Game: Enabling Real-Time Conversations with Duplex Models Paper • 2406.15718 • Published Jun 22, 2024 • 14