Model description

This repository contains over 500 model checkpoints for the paper Loss-to-Loss Prediction: Scaling Laws for All Datasets, with models ranging in size from 20M parameters up to 3.3B parameters and FLOP budgets from 2e17 to 1e21 FLOPs across 6 different pretraining datasets.

Each subdirectory name contains four different parameters to identify the model in that subdirectory:

  • Dataset: one of fineweb-100b, fineweb-edu-100b, proof-pile-2, slimpajama-chunk1, smollm-corpus, or starcoder
  • N: the number of model parameters
  • D: the number of training tokens
  • C: the number of training FLOPs

For example, a model trained on starcoder with 1.1e08 parameters on 3.0e08 tokens for a total of 2.0e17 FLOPs would have the name: L2L_starcoder_N1.1e08_D3.0e08_C2.0e17/

Full training details for the models can be found in the training repository or paper.

How to load a model

First, follow the instructions in the training repository to install our fork of the OLMo package.

With this installed, you can then use the huggingface_hub and transformers packages to load a model with the following snippet:

from olmo.model import HFMixinOLMo
from huggingface_hub import snapshot_download

tmp_dir = "tmp"
model_name = "L2L_starcoder_N1.1e08_D3.0e08_C2.0e17"

snapshot_download(
    repo_id="KempnerInstituteAI/loss-to-loss", 
    allow_patterns=f"{model_name}/*", 
    local_dir=tmp_dir,
)

model = HFMixinOLMo.from_pretrained(f"{tmp_dir}/{model_name}")

Citation

If you use these models in your research, please cite this paper:

@article{brandfonbrener2024loss,
      title={Loss-to-Loss Prediction: Scaling Laws for All Datasets}, 
      author={Brandfonbrener, David and Anand, Nikhil and Vyas, Nikhil and Malach, Eran and Kakade, Sham},
      journal={arXiv preprint arXiv:2411.12925},
      year={2024}
}

License

These models are licensed under Apache 2.0. It is intended for research and educational use.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The HF Inference API does not support text-generation models for olmo library.