-
Will we run out of data? An analysis of the limits of scaling datasets in Machine Learning
Paper • 2211.04325 • Published -
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Paper • 1810.04805 • Published • 17 -
On the Opportunities and Risks of Foundation Models
Paper • 2108.07258 • Published -
Super-NaturalInstructions: Generalization via Declarative Instructions on 1600+ NLP Tasks
Paper • 2204.07705 • Published • 1
Collections
Discover the best community collections!
Collections including paper arxiv:2311.03099
-
Meta-Learning a Dynamical Language Model
Paper • 1803.10631 • Published • 1 -
TLDR: Token Loss Dynamic Reweighting for Reducing Repetitive Utterance Generation
Paper • 2003.11963 • Published -
BigScience: A Case Study in the Social Construction of a Multilingual Large Language Model
Paper • 2212.04960 • Published • 1 -
Continuous Learning in a Hierarchical Multiscale Neural Network
Paper • 1805.05758 • Published • 2
-
Qualitatively characterizing neural network optimization problems
Paper • 1412.6544 • Published • 4 -
Averaging Weights Leads to Wider Optima and Better Generalization
Paper • 1803.05407 • Published • 2 -
Merging Models with Fisher-Weighted Averaging
Paper • 2111.09832 • Published • 1 -
Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time
Paper • 2203.05482 • Published • 6
-
Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time
Paper • 2203.05482 • Published • 6 -
Editing Models with Task Arithmetic
Paper • 2212.04089 • Published • 6 -
Resolving Interference When Merging Models
Paper • 2306.01708 • Published • 14 -
Language Models are Super Mario: Absorbing Abilities from Homologous Models as a Free Lunch
Paper • 2311.03099 • Published • 29
-
Fantastic Gains and Where to Find Them: On the Existence and Prospect of General Knowledge Transfer between Any Pretrained Model
Paper • 2310.17653 • Published • 2 -
Language Models are Super Mario: Absorbing Abilities from Homologous Models as a Free Lunch
Paper • 2311.03099 • Published • 29 -
"I Want It That Way": Enabling Interactive Decision Support Using Large Language Models and Constraint Programming
Paper • 2312.06908 • Published • 10 -
SaulLM-7B: A pioneering Large Language Model for Law
Paper • 2403.03883 • Published • 80
-
S-LoRA: Serving Thousands of Concurrent LoRA Adapters
Paper • 2311.03285 • Published • 32 -
Tailoring Self-Rationalizers with Multi-Reward Distillation
Paper • 2311.02805 • Published • 7 -
Ultra-Long Sequence Distributed Transformer
Paper • 2311.02382 • Published • 6 -
OpenChat: Advancing Open-source Language Models with Mixed-Quality Data
Paper • 2309.11235 • Published • 15