ChronoBERT
Collection
a series high-performance chronologically consistent large language models (LLM) designed to eliminate lookahead bias and training leakage.
•
26 items
•
Updated
ChronoBERT is a series high-performance chronologically consistent large language models (LLM) designed to eliminate lookahead bias and training leakage while maintain good language understanding in time-sensitive applications. The model is pretrained on diverse, high-quality, open-source, and timestamped text to maintain chronological consistency.
All models in the series achieve GLUE benchmark scores that surpass standard BERT. This approach preserves the integrity of historical analysis and enables more reliable economic and financial modeling.
The model is compatible with the transformers
library starting from v4.48.0:
pip install -U transformers>=4.48.0
pip install flash-attn
Here is an example code of using the model:
from transformers import AutoTokenizer, AutoModel
device = 'cuda:0'
tokenizer = AutoTokenizer.from_pretrained("manelalab/chrono-bert-v1-19991231")
model = AutoModel.from_pretrained("manelalab/chrono-bert-v1-19991231").to(device)
text = "Obviously, the time continuum has been disrupted, creating a new temporal event sequence resulting in this alternate reality. -- Dr. Brown, Back to the Future Part II"
inputs = tokenizer(text, return_tensors="pt").to(device)
outputs = model(**inputs)
@article{He2025ChronoBERT,
title={Chronologically Consistent Large Language Models},
author={He, Songrun and Lv, Linying and Manela, Asaf and Wu, Jimmy},
journal={Working Paper},
year={2025}
}