HiP Attention: Sparse Sub-Quadratic Attention with Hierarchical Attention Pruning Paper • 2406.09827 • Published Jun 14, 2024 • 2
InfiniteHiP: Extending Language Model Context Up to 3 Million Tokens on a Single GPU Paper • 2502.08910 • Published 10 days ago • 139