MiniPLM: Knowledge Distillation for Pre-Training Language Models Paper • 2410.17215 • Published Oct 22, 2024 • 14
Data Selection via Optimal Control for Language Models Paper • 2410.07064 • Published Oct 9, 2024 • 8
Direct Preference Knowledge Distillation for Large Language Models Paper • 2406.19774 • Published Jun 28, 2024 • 22
Instruction Pre-Training: Language Models are Supervised Multitask Learners Paper • 2406.14491 • Published Jun 20, 2024 • 88
Synthetic Data (Almost) from Scratch: Generalized Instruction Tuning for Language Models Paper • 2402.13064 • Published Feb 20, 2024 • 48