-
Attention Is All You Need
Paper • 1706.03762 • Published • 55 -
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Paper • 1810.04805 • Published • 17 -
RoBERTa: A Robustly Optimized BERT Pretraining Approach
Paper • 1907.11692 • Published • 7 -
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter
Paper • 1910.01108 • Published • 14
Collections
Discover the best community collections!
Collections including paper arxiv:2402.19173
-
CodePlan: Repository-level Coding using LLMs and Planning
Paper • 2309.12499 • Published • 77 -
SCREWS: A Modular Framework for Reasoning with Revisions
Paper • 2309.13075 • Published • 17 -
MathCoder: Seamless Code Integration in LLMs for Enhanced Mathematical Reasoning
Paper • 2310.03731 • Published • 29 -
Lemur: Harmonizing Natural Language and Code for Language Agents
Paper • 2310.06830 • Published • 34
-
Mamba: Linear-Time Sequence Modeling with Selective State Spaces
Paper • 2312.00752 • Published • 143 -
Elucidating the Design Space of Diffusion-Based Generative Models
Paper • 2206.00364 • Published • 15 -
GLU Variants Improve Transformer
Paper • 2002.05202 • Published • 3 -
StarCoder 2 and The Stack v2: The Next Generation
Paper • 2402.19173 • Published • 138
-
Plot2Code: A Comprehensive Benchmark for Evaluating Multi-modal Large Language Models in Code Generation from Scientific Plots
Paper • 2405.07990 • Published • 20 -
Large Language Models as Planning Domain Generators
Paper • 2405.06650 • Published • 13 -
AutoCrawler: A Progressive Understanding Web Agent for Web Crawler Generation
Paper • 2404.12753 • Published • 43 -
OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments
Paper • 2404.07972 • Published • 48