JonasGeiping commited on
Commit
e6349f9
·
verified ·
1 Parent(s): 88b2247

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +3 -3
README.md CHANGED
@@ -93,7 +93,7 @@ pipeline_tag: text-generation
93
  ---
94
 
95
  # Huginn-0125
96
- This is Huginn, version 01/25. This is a latent recurrent-depth model with 3.5B parameters, trained for 800B tokens on AMD MI250X machines. This is a proof-of-concept model, but surprisingly capable in reasoning and code given its training budget and size.
97
  All details on this model can be found in the tech report: "Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach." (https://www.arxiv.org/abs/2502.05171)
98
 
99
  8 intermediate checkpoints of the model can be found in its collection. Additional intermediate checkpoints are available upon request while we find a place to host all ~350 of them. The data used to train
@@ -189,8 +189,8 @@ outputs = model(input_ids=input_ids, use_cache=True, past_key_values=past_key_va
189
  ## Advanced Features
190
 
191
  ### Per-Token Adaptive Compute
192
- When generating, you can also a variable amount of compute per-token. The model is not trained for this, so this is a proof-of-concept, that can do this task zero-shot.
193
- You can pick between a few sane stopping rules, `entropy-diff`, `latent-diff`,`kl` and `argmax-stability`, via `criterion=kl`. The exit threshold can be modified via `exit_threshold=5e-4`.
194
  We suggest using `kl` for interesting exits and `argmax-stability` for conservative exits. Note that using these variables overrides the default generation function. Not all arguments that are valid for the normal `generate` call are valid here. To make this more explicit, you can also directly call `generate_with_adaptive_compute`:
195
 
196
  ```python
 
93
  ---
94
 
95
  # Huginn-0125
96
+ This is Huginn, version 01/25, a latent recurrent-depth model with 3.5B parameters, trained for 800B tokens on AMD MI250X machines. This is a proof-of-concept model, but surprisingly capable in reasoning and code given its training budget and size.
97
  All details on this model can be found in the tech report: "Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach." (https://www.arxiv.org/abs/2502.05171)
98
 
99
  8 intermediate checkpoints of the model can be found in its collection. Additional intermediate checkpoints are available upon request while we find a place to host all ~350 of them. The data used to train
 
189
  ## Advanced Features
190
 
191
  ### Per-Token Adaptive Compute
192
+ When generating, you can use a variable amount of compute per-token. The model is not trained for this, so this is a proof-of-concept, that it can do this task zero-shot.
193
+ You can pick between a few sane stopping rules, `entropy-diff`, `latent-diff`,`kl` and `argmax-stability`, via `criterion=...`. The exit threshold can be modified via `exit_threshold=5e-4`.
194
  We suggest using `kl` for interesting exits and `argmax-stability` for conservative exits. Note that using these variables overrides the default generation function. Not all arguments that are valid for the normal `generate` call are valid here. To make this more explicit, you can also directly call `generate_with_adaptive_compute`:
195
 
196
  ```python