SAEs and transcoders can be loaded using https://github.com/EleutherAI/sae.

These transcoders were trained on the outputs of the first 15 MLPs in deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B. We used 10 billion tokens from FineWeb edu deduped at a context length of 2048. The number of latents is 65,536 and a linear skip connection is included.

Fraction of variance unexplained ranges from 0.01 to 0.37.

Downloads last month
27
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no library tag.

Model tree for EleutherAI/skip-transcoder-DeepSeek-R1-Distill-Qwen-1.5B-65k

Finetuned
(83)
this model

Dataset used to train EleutherAI/skip-transcoder-DeepSeek-R1-Distill-Qwen-1.5B-65k

Collection including EleutherAI/skip-transcoder-DeepSeek-R1-Distill-Qwen-1.5B-65k