Model description

This repository contains over 500 model checkpoints for the paper Loss-to-Loss Prediction: Scaling Laws for All Datasets, with models ranging in size from 20M parameters up to 3.3B parameters and FLOP budgets from 2e17 to 1e21 FLOPs across 6 different pretraining datasets.

Each subdirectory name contains four different parameters to identify the model in that subdirectory:

Dataset: one of fineweb-100b, fineweb-edu-100b, proof-pile-2, slimpajama-chunk1, smollm-corpus, or starcoder
N: the number of model parameters
D: the number of training tokens
C: the number of training FLOPs

For example, a model trained on starcoder with 1.1e08 parameters on 3.0e08 tokens for a total of 2.0e17 FLOPs would have the name: L2L_starcoder_N1.1e08_D3.0e08_C2.0e17/

Full training details for the models can be found in the training repository or paper.

How to load a model

First, follow the instructions in the training repository to install our fork of the OLMo package.

With this installed, you can then use the huggingface_hub and transformers packages to load a model with the following snippet:

from olmo.model import HFMixinOLMo
from huggingface_hub import snapshot_download

tmp_dir = "tmp"
model_name = "L2L_starcoder_N1.1e08_D3.0e08_C2.0e17"

snapshot_download(
    repo_id="KempnerInstituteAI/loss-to-loss", 
    allow_patterns=f"{model_name}/*", 
    local_dir=tmp_dir,
)

model = HFMixinOLMo.from_pretrained(f"{tmp_dir}/{model_name}")

Citation

If you use these models in your research, please cite this paper:

@article{brandfonbrener2024loss,
      title={Loss-to-Loss Prediction: Scaling Laws for All Datasets}, 
      author={Brandfonbrener, David and Anand, Nikhil and Vyas, Nikhil and Malach, Eran and Kakade, Sham},
      journal={arXiv preprint arXiv:2411.12925},
      year={2024}
}

License

These models are licensed under Apache 2.0. It is intended for research and educational use.