No More Adam: Learning Rate Scaling at Initialization is All You Need Paper • 2412.11768 • Published Dec 16, 2024 • 41
TheAgentCompany: Benchmarking LLM Agents on Consequential Real World Tasks Paper • 2412.14161 • Published Dec 18, 2024 • 51
HiRED: Attention-Guided Token Dropping for Efficient Inference of High-Resolution Vision-Language Models in Resource-Constrained Environments Paper • 2408.10945 • Published Aug 20, 2024 • 11
PDFTriage: Question Answering over Long, Structured Documents Paper • 2309.08872 • Published Sep 16, 2023 • 54
Compressed Chain of Thought: Efficient Reasoning Through Dense Representations Paper • 2412.13171 • Published Dec 17, 2024 • 31
A Modern Self-Referential Weight Matrix That Learns to Modify Itself Paper • 2202.05780 • Published Feb 11, 2022
How many words does ChatGPT know? The answer is ChatWords Paper • 2309.16777 • Published Sep 28, 2023 • 1
Graph of Thoughts: Solving Elaborate Problems with Large Language Models Paper • 2308.09687 • Published Aug 18, 2023 • 7
SequenceMatch: Imitation Learning for Autoregressive Sequence Modelling with Backtracking Paper • 2306.05426 • Published Jun 8, 2023
Think before you speak: Training Language Models With Pause Tokens Paper • 2310.02226 • Published Oct 3, 2023 • 2
What do tokens know about their characters and how do they know it? Paper • 2206.02608 • Published Jun 6, 2022