You Do Not Fully Utilize Transformer's Representation Capacity Paper • 2502.09245 • Published 28 days ago • 34
Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention Paper • 2502.11089 • Published 25 days ago • 142
ConflictBank: A Benchmark for Evaluating the Influence of Knowledge Conflicts in LLM Paper • 2408.12076 • Published Aug 22, 2024 • 12