Create README.md
Browse files
README.md
ADDED
@@ -0,0 +1,33 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
license: apache-2.0
|
3 |
+
language:
|
4 |
+
- en
|
5 |
+
tags:
|
6 |
+
- mechanistic interpretability
|
7 |
+
- sparse autoencoder
|
8 |
+
- llama
|
9 |
+
- llama-3
|
10 |
+
---
|
11 |
+
|
12 |
+
## Model Information
|
13 |
+
|
14 |
+
A SAE (Sparse Autoencoder) for [meta-llama/Llama-3.2-1B-Instruct](https://huggingface.co/meta-llama/Llama-3.2-1B-Instruct).
|
15 |
+
|
16 |
+
It is trained specifically on layer 9 of Llama 3.2 1B and achieves a final L0 of 63 during training.
|
17 |
+
|
18 |
+
This model is used to decompose Llama's activations into interpretable features.
|
19 |
+
|
20 |
+
The SAE weights are released under Apache, however Llama 3.2 1B is to be used under Meta's Llama 3.2 License.
|
21 |
+
|
22 |
+
## How to use
|
23 |
+
|
24 |
+
A Jupyter Notebook is provided to test the model
|
25 |
+
|
26 |
+
<a target="_blank" href="https://colab.research.google.com/github/qrsch/SAE/blob/main/SAE.ipynb">
|
27 |
+
<img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab", width="200px"/>
|
28 |
+
</a>
|
29 |
+
|
30 |
+
## Training
|
31 |
+
|
32 |
+
Our SAE was trained using [LMSYS-Chat-1M dataset](https://arxiv.org/pdf/2309.11998), on a single RTX 3090. The training script will be provided soon in the following repository: https://github.com/qrsch/SAE/tree/main
|
33 |
+
|