ProX Dataset Collection a collection of pre-training corpora refined by ProX • 5 items • Updated 2 days ago • 5
SUPER: Evaluating Agents on Setting Up and Executing Tasks from Research Repositories Paper • 2409.07440 • Published Sep 11, 2024 • 6 • 2
Dolma: an Open Corpus of Three Trillion Tokens for Language Model Pretraining Research Paper • 2402.00159 • Published Jan 31, 2024 • 62