AI & ML interests

NLP Research

Recent Activity

koalazf99  published a dataset 1 minute ago
gair-prox/DCLM-pro
koalazf99  updated a dataset about 2 hours ago
gair-prox/DCLM-pro
Pengfei  authored a paper 8 days ago
LIMO: Less is More for Reasoning
View all activity

Clickable Image

GAIR-ProX, a subsidiary of GAIR, spearheads the 🫐 ProX Project. This initiative aims to enhance pre-training efficiency by refining corpus documents using language models at scale. Through meticulous operations (e.g., document-level filtering and chunk-level cleaning), implemented as scalable, executable programs, 🫐 ProX seeks to improve pre-training data quality at scale, ultimately developing more robust and efficient language models.

Read our technical report!