tomg-group-umd
/

huginn-0125

Text Generation

Model card Files Files and versions Community

JonasGeiping commited on about 1 month ago

Commit

e6349f9

·

verified ·

1 Parent(s): 88b2247

Update README.md

Files changed (1) hide show

README.md +3 -3

README.md CHANGED Viewed

@@ -93,7 +93,7 @@ pipeline_tag: text-generation
 ---
 # Huginn-0125
-This is Huginn, version 01/25. This is a latent recurrent-depth model with 3.5B parameters, trained for 800B tokens on AMD MI250X machines. This is a proof-of-concept model, but surprisingly capable in reasoning and code given its training budget and size.
 All details on this model can be found in the tech report: "Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach." (https://www.arxiv.org/abs/2502.05171)
 8 intermediate checkpoints of the model can be found in its collection. Additional intermediate checkpoints are available upon request while we find a place to host all ~350 of them. The data used to train
@@ -189,8 +189,8 @@ outputs = model(input_ids=input_ids, use_cache=True, past_key_values=past_key_va
 ## Advanced Features
 ### Per-Token Adaptive Compute
-When generating, you can also a variable amount of compute per-token. The model is not trained for this, so this is a proof-of-concept, that can do this task zero-shot.
-You can pick between a few sane stopping rules, `entropy-diff`, `latent-diff`,`kl` and `argmax-stability`, via `criterion=kl`. The exit threshold can be modified via `exit_threshold=5e-4`.
 We suggest using `kl` for interesting exits and `argmax-stability` for conservative exits. Note that using these variables overrides the default generation function. Not all arguments that are valid for the normal `generate` call are valid here. To make this more explicit, you can also directly call `generate_with_adaptive_compute`:
 ```python

 ---
 # Huginn-0125
+This is Huginn, version 01/25, a latent recurrent-depth model with 3.5B parameters, trained for 800B tokens on AMD MI250X machines. This is a proof-of-concept model, but surprisingly capable in reasoning and code given its training budget and size.
 All details on this model can be found in the tech report: "Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach." (https://www.arxiv.org/abs/2502.05171)
 8 intermediate checkpoints of the model can be found in its collection. Additional intermediate checkpoints are available upon request while we find a place to host all ~350 of them. The data used to train
 ## Advanced Features
 ### Per-Token Adaptive Compute
+When generating, you can use a variable amount of compute per-token. The model is not trained for this, so this is a proof-of-concept, that it can do this task zero-shot.
+You can pick between a few sane stopping rules, `entropy-diff`, `latent-diff`,`kl` and `argmax-stability`, via `criterion=...`. The exit threshold can be modified via `exit_threshold=5e-4`.
 We suggest using `kl` for interesting exits and `argmax-stability` for conservative exits. Note that using these variables overrides the default generation function. Not all arguments that are valid for the normal `generate` call are valid here. To make this more explicit, you can also directly call `generate_with_adaptive_compute`:
 ```python