JonasGeiping commited on
Commit
64e294e
·
verified ·
1 Parent(s): 288f483

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +5 -3
README.md CHANGED
@@ -75,8 +75,7 @@ The model was not finetuned or post-trained, but due to inclusion of instruction
75
  messages = []
76
  messages.append({"role": "system", "content" : You are a helpful assistant."}
77
  messages.append({"role": "user", "content" : What do you think of Goethe's Faust?"}
78
- formatted_messages = [{"role": "Huginn" if m["role"] == "assistant" else m["role"], "content": m.content.strip()} for m in messages]
79
- chat_input = tokenizer.apply_chat_template(formatted_messages, tokenize=False, add_generation_prompt=True)
80
  print(chat_input)
81
  input_ids = tokenizer.encode(chat_input, return_tensors="pt", add_special_tokens=False).to(device)
82
 
@@ -153,7 +152,10 @@ After finishing all iterations, the coda block processes the last state and prod
153
  Please refer to the paper for benchmark performance on standard benchmarks.
154
 
155
  ## Limitations
156
- Our checkpoint is trained for only 47000 steps on a broadly untested mixture, and the learning rate is never cooled down from its peak. As an academic project, the model is trained only on publicly available data and the 800B token count, while large in comparison to older fully open-source models such as the Pythia series, is small in comparison to modern open-source efforts such as OLMo, and tiny in comparison to the datasets used to train industrial open-weight models.
 
 
 
157
 
158
  ## License
159
  This model is released under the [apache-2.0](https://choosealicense.com/licenses/apache-2.0/) licence.
 
75
  messages = []
76
  messages.append({"role": "system", "content" : You are a helpful assistant."}
77
  messages.append({"role": "user", "content" : What do you think of Goethe's Faust?"}
78
+ chat_input = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
 
79
  print(chat_input)
80
  input_ids = tokenizer.encode(chat_input, return_tensors="pt", add_special_tokens=False).to(device)
81
 
 
152
  Please refer to the paper for benchmark performance on standard benchmarks.
153
 
154
  ## Limitations
155
+ Our checkpoint is trained for only 47000 steps on a broadly untested data mixture with a constant learning rate. As an academic project, the model is trained only on publicly available data and the 800B token count, while large in comparison to older fully open-source models such as the Pythia series, is small in comparison to modern open-source efforts such as OLMo, and tiny in comparison to the datasets used to train industrial open-weight models.
156
+
157
+ ## Technical Specifications
158
+ This model was trained on 21 segments of 4096 AMD MI-250X GPUs on the OLCF Frontier Supercomputer in early December 2024. The model was trained using ROCM 6.2.0, and PyTorch 2.6 nightly pre-release 24/11/02. The code used to train the model can be found at https://github.com/seal-rg/recurrent-pretraining.
159
 
160
  ## License
161
  This model is released under the [apache-2.0](https://choosealicense.com/licenses/apache-2.0/) licence.