Data Selection

community

https://github.com/microsoft/LMOps/tree/main/data_selection

Activity Feed

AI & ML interests

Data Selection for Language Models

Recent Activity

t1101675 updated a model about 1 month ago

Data-Selection/data_scorer

t1101675 updated a model about 1 month ago

Data-Selection/data_scorer

t1101675 authored a paper 2 months ago

NVILA: Efficient Frontier Visual Language Models

View all activity

Data-Selection's activity

t1101675

updated a model about 1 month ago

Data-Selection/data_scorer

Updated Jan 5 • 3

t1101675

authored a paper 2 months ago

NVILA: Efficient Frontier Visual Language Models

Paper • 2412.04468 • Published Dec 5, 2024 • 57

t1101675

updated 8 models 4 months ago

t1101675

authored 2 papers 4 months ago

MiniPLM: Knowledge Distillation for Pre-Training Language Models

Paper • 2410.17215 • Published Oct 22, 2024 • 14

Data Selection via Optimal Control for Language Models

Paper • 2410.07064 • Published Oct 9, 2024 • 8

t1101675

authored 2 papers 8 months ago

Direct Preference Knowledge Distillation for Large Language Models

Paper • 2406.19774 • Published Jun 28, 2024 • 22

Instruction Pre-Training: Language Models are Supervised Multitask Learners

Paper • 2406.14491 • Published Jun 20, 2024 • 88

t1101675

authored 2 papers 12 months ago

Towards Optimal Learning of Language Models

Paper • 2402.17759 • Published Feb 27, 2024 • 17

Synthetic Data (Almost) from Scratch: Generalized Instruction Tuning for Language Models

Paper • 2402.13064 • Published Feb 20, 2024 • 48

t1101675

authored 2 papers over 1 year ago

Knowledge Distillation of Large Language Models

Paper • 2306.08543 • Published Jun 14, 2023 • 20

Pre-Training to Learn in Context

Paper • 2305.09137 • Published May 16, 2023 • 2

AI & ML interests

Recent Activity

Team members 1

Data-Selection's activity