manelalab
/

chrono-gpt-v1-20221231

@@ -58,7 +58,7 @@ logits, emb = model(inputs)
 ### Training Data
-- **Pretraining corpus:** Our initial model chrono-gpt-v1-19991231 is pretrained on 460 billion tokens of pre-2000, diverse, high-quality, and open-source text data to ensure no leakage of data afterwards.
 - **Incremental updates:** Yearly updates from 2000 to 2024 with an additional 65 billion tokens of timestamped text.
 ### Training Procedure

 ### Training Data
+- **Pretraining corpus:** Our initial model chrono-gpt-v1-19991231 is pretrained on 21 billion tokens of pre-2000, diverse, high-quality, and open-source text data to ensure no leakage of data afterwards.
 - **Incremental updates:** Yearly updates from 2000 to 2024 with an additional 65 billion tokens of timestamped text.
 ### Training Procedure