Update README.md
Browse files
README.md
CHANGED
@@ -93,7 +93,7 @@ pipeline_tag: text-generation
|
|
93 |
---
|
94 |
|
95 |
# Huginn-0125
|
96 |
-
This is Huginn, version 01/25
|
97 |
All details on this model can be found in the tech report: "Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach." (https://www.arxiv.org/abs/2502.05171)
|
98 |
|
99 |
8 intermediate checkpoints of the model can be found in its collection. Additional intermediate checkpoints are available upon request while we find a place to host all ~350 of them. The data used to train
|
@@ -189,8 +189,8 @@ outputs = model(input_ids=input_ids, use_cache=True, past_key_values=past_key_va
|
|
189 |
## Advanced Features
|
190 |
|
191 |
### Per-Token Adaptive Compute
|
192 |
-
When generating, you can
|
193 |
-
You can pick between a few sane stopping rules, `entropy-diff`, `latent-diff`,`kl` and `argmax-stability`, via `criterion
|
194 |
We suggest using `kl` for interesting exits and `argmax-stability` for conservative exits. Note that using these variables overrides the default generation function. Not all arguments that are valid for the normal `generate` call are valid here. To make this more explicit, you can also directly call `generate_with_adaptive_compute`:
|
195 |
|
196 |
```python
|
|
|
93 |
---
|
94 |
|
95 |
# Huginn-0125
|
96 |
+
This is Huginn, version 01/25, a latent recurrent-depth model with 3.5B parameters, trained for 800B tokens on AMD MI250X machines. This is a proof-of-concept model, but surprisingly capable in reasoning and code given its training budget and size.
|
97 |
All details on this model can be found in the tech report: "Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach." (https://www.arxiv.org/abs/2502.05171)
|
98 |
|
99 |
8 intermediate checkpoints of the model can be found in its collection. Additional intermediate checkpoints are available upon request while we find a place to host all ~350 of them. The data used to train
|
|
|
189 |
## Advanced Features
|
190 |
|
191 |
### Per-Token Adaptive Compute
|
192 |
+
When generating, you can use a variable amount of compute per-token. The model is not trained for this, so this is a proof-of-concept, that it can do this task zero-shot.
|
193 |
+
You can pick between a few sane stopping rules, `entropy-diff`, `latent-diff`,`kl` and `argmax-stability`, via `criterion=...`. The exit threshold can be modified via `exit_threshold=5e-4`.
|
194 |
We suggest using `kl` for interesting exits and `argmax-stability` for conservative exits. Note that using these variables overrides the default generation function. Not all arguments that are valid for the normal `generate` call are valid here. To make this more explicit, you can also directly call `generate_with_adaptive_compute`:
|
195 |
|
196 |
```python
|