Concept Steerers: Leveraging K-Sparse Autoencoders for Controllable Generations Paper • 2501.19066 • Published 12 days ago • 10
Towards System 2 Reasoning in LLMs: Learning How to Think With Meta Chain-of-Though Paper • 2501.04682 • Published Jan 8 • 89
Position Information Emerges in Causal Transformers Without Positional Encodings via Similarity of Nearby Embeddings Paper • 2501.00073 • Published Dec 30, 2024 • 1
Breaking the Stage Barrier: A Novel Single-Stage Approach to Long Context Extension for Large Language Models Paper • 2412.07171 • Published Dec 10, 2024 • 1
Rethinking Addressing in Language Models via Contexualized Equivariant Positional Encoding Paper • 2501.00712 • Published Jan 1 • 6
Understanding and Mitigating Bottlenecks of State Space Models through the Lens of Recency and Over-smoothing Paper • 2501.00658 • Published Dec 31, 2024 • 7
Revisiting In-Context Learning with Long Context Language Models Paper • 2412.16926 • Published Dec 22, 2024 • 30
Deliberation in Latent Space via Differentiable Cache Augmentation Paper • 2412.17747 • Published Dec 23, 2024 • 30
Fourier Position Embedding: Enhancing Attention's Periodic Extension for Length Generalization Paper • 2412.17739 • Published Dec 23, 2024 • 40
Star Attention: Efficient LLM Inference over Long Sequences Paper • 2411.17116 • Published Nov 26, 2024 • 49
Dynamic Memory Compression: Retrofitting LLMs for Accelerated Inference Paper • 2403.09636 • Published Mar 14, 2024 • 2
MoE-LLaVA: Mixture of Experts for Large Vision-Language Models Paper • 2401.15947 • Published Jan 29, 2024 • 51
Enhancing Multimodal Large Language Models with Vision Detection Models: An Empirical Study Paper • 2401.17981 • Published Jan 31, 2024 • 1
What Algorithms can Transformers Learn? A Study in Length Generalization Paper • 2310.16028 • Published Oct 24, 2023 • 2