songrunhe commited on
Commit
6034af7
·
verified ·
1 Parent(s): db5c5e7

Delete .ipynb_checkpoints

Browse files
.ipynb_checkpoints/README-checkpoint.md DELETED
@@ -1,99 +0,0 @@
1
- ---
2
- library_name: transformers
3
- license: mit
4
- language:
5
- - en
6
- tags:
7
- - chronologically consistent
8
- - modernbert
9
- - glue
10
- pipeline_tag: fill-mask
11
- inference: false
12
- ---
13
- # ChronoBERT
14
-
15
- ## Model Description
16
-
17
- ChronoBERT is a series **high-performance chronologically consistent large language models (LLM)** designed to eliminate lookahead bias and training leakage while maintain good language understanding in time-sensitive applications. The model is pretrained on **diverse, high-quality, open-source, and timestamped text** to maintain chronological consistency.
18
-
19
- All models in the series achieve **GLUE benchmark scores that surpass standard BERT.** This approach preserves the integrity of historical analysis and enables more reliable economic and financial modeling.
20
-
21
- - **Developed by:** Songrun He, Linying Lv, Asaf Manela, Jimmy Wu
22
- - **Model type:** Transformer-based bidirectional encoder (ModernBERT architecture)
23
- - **Language(s) (NLP):** English
24
- - **License:** MIT License
25
-
26
- ## Model Sources
27
-
28
- - **Paper:** "Chronologically Consistent Large Language Models" (He, Lv, Manela, Wu, 2025)
29
-
30
- ## How to Get Started with the Model
31
-
32
- The model is compatible with the `transformers` library starting from v4.48.0:
33
-
34
- ```sh
35
- pip install -U transformers>=4.48.0
36
- pip install flash-attn
37
- ```
38
-
39
- Here is an example code of using the model:
40
-
41
- ```python
42
- from transformers import AutoTokenizer, AutoModel
43
- device = 'cuda:0'
44
-
45
- tokenizer = AutoTokenizer.from_pretrained("manelalab/chrono-bert-v1-19991231")
46
- model = AutoModel.from_pretrained("manelalab/chrono-bert-v1-19991231").to(device)
47
-
48
- text = "Obviously, the time continuum has been disrupted, creating a new temporal event sequence resulting in this alternate reality. -- Dr. Brown, Back to the Future Part II"
49
-
50
- inputs = tokenizer(text, return_tensors="pt").to(device)
51
- outputs = model(**inputs)
52
- ```
53
-
54
- ## Training Details
55
-
56
- ### Training Data
57
-
58
- - **Pretraining corpus:** Our initial model chrono-bert-v1-19991231 is pretrained on 460 billion tokens of pre-2000, diverse, high-quality, and open-source text data to ensure no leakage of data afterwards.
59
- - **Incremental updates:** Yearly updates from 2000 to 2024 with an additional 65 billion tokens of timestamped text.
60
-
61
- ### Training Procedure
62
-
63
- - **Architecture:** ModernBERT-based model with rotary embeddings and flash attention.
64
- - **Objective:** Masked token prediction.
65
-
66
- ## Evaluation
67
-
68
- ### Testing Data, Factors & Metrics
69
-
70
- - **Language understanding:** Evaluated on **GLUE benchmark** tasks.
71
- - **Financial forecasting:** Evaluated using **return prediction task** based on Dow Jones Newswire data.
72
- - **Comparison models:** ChronoBERT was benchmarked against **BERT, FinBERT, StoriesLM-v1-1963, and Llama 3.1**.
73
-
74
- ### Results
75
-
76
- - **GLUE Score:** chrono-bert-v1-19991231 and chrono-bert-v1-20241231 achieved GLUE score of 84.71 and 85.54 respectively, outperforming BERT (84.52).
77
- - **Stock return predictions:** During the sample from 2008-01 to 2023-07, chrono-bert-v1-realtime achieves a long-short portfolio **Sharpe ratio of 4.80**, outperforming BERT, FinBERT, and StoriesLM-v1-1963, and comparable to **Llama 3.1 8B (4.90)**.
78
-
79
-
80
- ## Citation
81
-
82
- ```
83
- @article{He2025ChronoBERT,
84
- title={Chronologically Consistent Large Language Models},
85
- author={He, Songrun and Lv, Linying and Manela, Asaf and Wu, Jimmy},
86
- journal={Working Paper},
87
- year={2025}
88
- }
89
- ```
90
-
91
- ## Model Card Authors
92
-
93
- - Songrun He (Washington University in St. Louis, [email protected])
94
- - Linying Lv (Washington University in St. Louis, [email protected])
95
- - Asaf Manela (Washington University in St. Louis, [email protected])
96
- - Jimmy Wu (Washington University in St. Louis, [email protected])
97
-
98
-
99
-
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
.ipynb_checkpoints/config-checkpoint.json DELETED
@@ -1,48 +0,0 @@
1
- {
2
- "_name_or_path": "answerdotai/ModernBERT-base",
3
- "architectures": [
4
- "ModernBertModel"
5
- ],
6
- "attention_bias": false,
7
- "attention_dropout": 0.0,
8
- "bos_token_id": 50281,
9
- "classifier_activation": "gelu",
10
- "classifier_bias": false,
11
- "classifier_dropout": 0.0,
12
- "classifier_pooling": "mean",
13
- "cls_token_id": 50281,
14
- "decoder_bias": true,
15
- "deterministic_flash_attn": false,
16
- "embedding_dropout": 0.0,
17
- "eos_token_id": 50282,
18
- "global_attn_every_n_layers": 3,
19
- "global_rope_theta": 160000.0,
20
- "gradient_checkpointing": false,
21
- "hidden_activation": "gelu",
22
- "hidden_size": 768,
23
- "initializer_cutoff_factor": 2.0,
24
- "initializer_range": 0.02,
25
- "intermediate_size": 1152,
26
- "layer_norm_eps": 1e-05,
27
- "local_attention": 128,
28
- "local_rope_theta": 10000.0,
29
- "max_position_embeddings": 8192,
30
- "mlp_bias": false,
31
- "mlp_dropout": 0.0,
32
- "model_type": "modernbert",
33
- "norm_bias": false,
34
- "norm_eps": 1e-05,
35
- "num_attention_heads": 12,
36
- "num_hidden_layers": 22,
37
- "output_hidden_states": true,
38
- "pad_token_id": 50283,
39
- "position_embedding_type": "absolute",
40
- "reference_compile": null,
41
- "repad_logits_with_grad": false,
42
- "sep_token_id": 50282,
43
- "sparse_pred_ignore_index": -100,
44
- "sparse_prediction": false,
45
- "torch_dtype": "float32",
46
- "transformers_version": "4.48.2",
47
- "vocab_size": 50368
48
- }