Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention Paper • 2502.11089 • Published 5 days ago • 128
LongPO: Long Context Self-Evolution of Large Language Models through Short-to-Long Preference Optimization Paper • 2502.13922 • Published 1 day ago • 19
SafeRoute: Adaptive Model Selection for Efficient and Accurate Safety Guardrails in Large Language Models Paper • 2502.12464 • Published 3 days ago • 26
TransMLA: Multi-head Latent Attention Is All You Need Paper • 2502.07864 • Published 10 days ago • 42
InfiniteHiP: Extending Language Model Context Up to 3 Million Tokens on a Single GPU Paper • 2502.08910 • Published 8 days ago • 138 • 6
InfiniteHiP: Extending Language Model Context Up to 3 Million Tokens on a Single GPU Paper • 2502.08910 • Published 8 days ago • 138 • 6
InfiniteHiP: Extending Language Model Context Up to 3 Million Tokens on a Single GPU Paper • 2502.08910 • Published 8 days ago • 138
InfiniteHiP: Extending Language Model Context Up to 3 Million Tokens on a Single GPU Paper • 2502.08910 • Published 8 days ago • 138
HiP Attention: Sparse Sub-Quadratic Attention with Hierarchical Attention Pruning Paper • 2406.09827 • Published Jun 14, 2024 • 2
HiP Attention: Sparse Sub-Quadratic Attention with Hierarchical Attention Pruning Paper • 2406.09827 • Published Jun 14, 2024 • 2
view article Article Mastering Long Contexts in LLMs with KVPress By nvidia and 1 other • 29 days ago • 63
VideoRAG: Retrieval-Augmented Generation over Video Corpus Paper • 2501.05874 • Published Jan 10 • 67
VideoICL: Confidence-based Iterative In-context Learning for Out-of-Distribution Video Understanding Paper • 2412.02186 • Published Dec 3, 2024 • 22
SEA: Sparse Linear Attention with Estimated Attention Mask Paper • 2310.01777 • Published Oct 3, 2023 • 1