Global Batch size : 384 seq_len: 2048
Checkpoint every 500 steps
i.e every 393216000 tokens or 400M Tokens
Current Revison available as
checkpoint-500
393Mcheckpoint-1000
786Mcheckpoint-1500
1.18Bcheckpoint-2000
1.57Bcheckpoint-2500
1.96B
max_lr : 7e-5
Inference Providers
NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API:
The model has no pipeline_tag.