Data: c4 and codeparrot, about 1:1 sample-wise but 1:4 token-wise mix. Significantly biased for codes (python, go, java, javascript, c, c++). 1 epoch but use 48x instead of 32x default sae.
Params:
- batch size 64 * 2048 * 8 = 1048576 tokens
- lr automatically according to EAI sae codebase
- auxk_alpha 0.03
Inference Providers
NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API:
The model has no library tag.