-
Will we run out of data? An analysis of the limits of scaling datasets in Machine Learning
Paper • 2211.04325 • Published -
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Paper • 1810.04805 • Published • 17 -
On the Opportunities and Risks of Foundation Models
Paper • 2108.07258 • Published -
Super-NaturalInstructions: Generalization via Declarative Instructions on 1600+ NLP Tasks
Paper • 2204.07705 • Published • 1
Collections
Discover the best community collections!
Collections including paper arxiv:2406.11794
-
DataComp-LM: In search of the next generation of training sets for language models
Paper • 2406.11794 • Published • 50 -
Training Language Models on Synthetic Edit Sequences Improves Code Synthesis
Paper • 2410.02749 • Published • 12 -
Fewer Truncations Improve Language Modeling
Paper • 2404.10830 • Published • 3 -
How to Train Long-Context Language Models (Effectively)
Paper • 2410.02660 • Published • 2
-
Qwen2.5 Technical Report
Paper • 2412.15115 • Published • 352 -
Qwen2.5-Coder Technical Report
Paper • 2409.12186 • Published • 141 -
Qwen2.5-Math Technical Report: Toward Mathematical Expert Model via Self-Improvement
Paper • 2409.12122 • Published • 3 -
Qwen2.5-VL Technical Report
Paper • 2502.13923 • Published • 162
-
Same Task, More Tokens: the Impact of Input Length on the Reasoning Performance of Large Language Models
Paper • 2402.14848 • Published • 19 -
The Prompt Report: A Systematic Survey of Prompting Techniques
Paper • 2406.06608 • Published • 60 -
CRAG -- Comprehensive RAG Benchmark
Paper • 2406.04744 • Published • 47 -
Transformers meet Neural Algorithmic Reasoners
Paper • 2406.09308 • Published • 44
-
MS MARCO Web Search: a Large-scale Information-rich Web Dataset with Millions of Real Click Labels
Paper • 2405.07526 • Published • 21 -
Automatic Data Curation for Self-Supervised Learning: A Clustering-Based Approach
Paper • 2405.15613 • Published • 17 -
A Touch, Vision, and Language Dataset for Multimodal Alignment
Paper • 2402.13232 • Published • 15 -
How Do Large Language Models Acquire Factual Knowledge During Pretraining?
Paper • 2406.11813 • Published • 31
-
DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model
Paper • 2405.04434 • Published • 18 -
The FineWeb Datasets: Decanting the Web for the Finest Text Data at Scale
Paper • 2406.17557 • Published • 93 -
DataComp-LM: In search of the next generation of training sets for language models
Paper • 2406.11794 • Published • 50 -
MobileLLM: Optimizing Sub-billion Parameter Language Models for On-Device Use Cases
Paper • 2402.14905 • Published • 128
-
Getting it Right: Improving Spatial Consistency in Text-to-Image Models
Paper • 2404.01197 • Published • 31 -
CosmicMan: A Text-to-Image Foundation Model for Humans
Paper • 2404.01294 • Published • 16 -
mOSCAR: A Large-scale Multilingual and Multimodal Document-level Corpus
Paper • 2406.08707 • Published • 16 -
DataComp-LM: In search of the next generation of training sets for language models
Paper • 2406.11794 • Published • 50
-
World Model on Million-Length Video And Language With RingAttention
Paper • 2402.08268 • Published • 38 -
Improving Text Embeddings with Large Language Models
Paper • 2401.00368 • Published • 80 -
Chain-of-Thought Reasoning Without Prompting
Paper • 2402.10200 • Published • 105 -
FiT: Flexible Vision Transformer for Diffusion Model
Paper • 2402.12376 • Published • 48