Update README.md
Browse files
README.md
CHANGED
@@ -75,8 +75,7 @@ The model was not finetuned or post-trained, but due to inclusion of instruction
|
|
75 |
messages = []
|
76 |
messages.append({"role": "system", "content" : You are a helpful assistant."}
|
77 |
messages.append({"role": "user", "content" : What do you think of Goethe's Faust?"}
|
78 |
-
|
79 |
-
chat_input = tokenizer.apply_chat_template(formatted_messages, tokenize=False, add_generation_prompt=True)
|
80 |
print(chat_input)
|
81 |
input_ids = tokenizer.encode(chat_input, return_tensors="pt", add_special_tokens=False).to(device)
|
82 |
|
@@ -153,7 +152,10 @@ After finishing all iterations, the coda block processes the last state and prod
|
|
153 |
Please refer to the paper for benchmark performance on standard benchmarks.
|
154 |
|
155 |
## Limitations
|
156 |
-
Our checkpoint is trained for only 47000 steps on a broadly untested mixture
|
|
|
|
|
|
|
157 |
|
158 |
## License
|
159 |
This model is released under the [apache-2.0](https://choosealicense.com/licenses/apache-2.0/) licence.
|
|
|
75 |
messages = []
|
76 |
messages.append({"role": "system", "content" : You are a helpful assistant."}
|
77 |
messages.append({"role": "user", "content" : What do you think of Goethe's Faust?"}
|
78 |
+
chat_input = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
|
|
|
79 |
print(chat_input)
|
80 |
input_ids = tokenizer.encode(chat_input, return_tensors="pt", add_special_tokens=False).to(device)
|
81 |
|
|
|
152 |
Please refer to the paper for benchmark performance on standard benchmarks.
|
153 |
|
154 |
## Limitations
|
155 |
+
Our checkpoint is trained for only 47000 steps on a broadly untested data mixture with a constant learning rate. As an academic project, the model is trained only on publicly available data and the 800B token count, while large in comparison to older fully open-source models such as the Pythia series, is small in comparison to modern open-source efforts such as OLMo, and tiny in comparison to the datasets used to train industrial open-weight models.
|
156 |
+
|
157 |
+
## Technical Specifications
|
158 |
+
This model was trained on 21 segments of 4096 AMD MI-250X GPUs on the OLCF Frontier Supercomputer in early December 2024. The model was trained using ROCM 6.2.0, and PyTorch 2.6 nightly pre-release 24/11/02. The code used to train the model can be found at https://github.com/seal-rg/recurrent-pretraining.
|
159 |
|
160 |
## License
|
161 |
This model is released under the [apache-2.0](https://choosealicense.com/licenses/apache-2.0/) licence.
|