dar-tau commited on
Commit
34b25c9
Β·
verified Β·
1 Parent(s): b4d2f29

Update app.py

Browse files
Files changed (1) hide show
  1. app.py +3 -3
app.py CHANGED
@@ -131,10 +131,10 @@ with gr.Blocks(theme=gr.themes.Default(), css=css) as demo:
131
  gr.Markdown('''
132
  # 😎 Self-Interpreting Models 😎
133
 
134
- πŸ‘Ύ **This space follows the emerging trend of models interpreting their _own hidden states_ in free form natural language**!! πŸ‘Ύ
135
  This idea was explored in the paper **Patchscopes** ([Ghandeharioun et al., 2024](https://arxiv.org/abs/2401.06102)) and was later investigated further in **SelfIE** ([Chen et al., 2024](https://arxiv.org/abs/2403.10949)).
136
- An honorary mention for **Speaking Probes** ([Dar, 2023](https://towardsdatascience.com/speaking-probes-self-interpreting-models-7a3dc6cb33d6) -- my post!! πŸ₯³) which was a less mature approach but with the same idea in mind.
137
- We follow the SelfIE implementation in this space for concreteness. Patchscopes are so general that they encompass many other interpretation techniques too!!!
138
 
139
  πŸ‘Ύ **The idea is really simple: models are able to understand their own hidden states by nature!** πŸ‘Ύ
140
  If I give a model a prompt of the form ``User: [X] Assistant: Sure'll I'll repeat your message`` and replace ``[X]`` *during computation* with the hidden state we want to understand,
 
131
  gr.Markdown('''
132
  # 😎 Self-Interpreting Models 😎
133
 
134
+ πŸ‘Ύ **This space is a simple introduction to the emerging trend of models interpreting their _own hidden states_ in free form natural language**!! πŸ‘Ύ
135
  This idea was explored in the paper **Patchscopes** ([Ghandeharioun et al., 2024](https://arxiv.org/abs/2401.06102)) and was later investigated further in **SelfIE** ([Chen et al., 2024](https://arxiv.org/abs/2403.10949)).
136
+ An honorary mention of **Speaking Probes** ([Dar, 2023](https://towardsdatascience.com/speaking-probes-self-interpreting-models-7a3dc6cb33d6) -- my own work!! πŸ₯³) which was a less mature but had the same idea in mind.
137
+ We will follow the SelfIE implementation in this space for concreteness. Patchscopes are so general that they encompass many other interpretation techniques too!!!
138
 
139
  πŸ‘Ύ **The idea is really simple: models are able to understand their own hidden states by nature!** πŸ‘Ύ
140
  If I give a model a prompt of the form ``User: [X] Assistant: Sure'll I'll repeat your message`` and replace ``[X]`` *during computation* with the hidden state we want to understand,