Spaces:

Glaciohound
/

LM-Steer

Running

App Files Files Community

hanchier commited on Sep 30, 2024

Commit

3ecbde7

1 Parent(s): acd6966

word embeddings

Browse files

Files changed (1) hide show

app.py +10 -15

app.py CHANGED Viewed

@@ -36,7 +36,7 @@ def word_embedding_space_analysis(
     S, V, D = torch.linalg.svd(matrix)
     data = []
-    top = 30
     select_words = 20
     n_dim = 10
     for _i in range(n_dim):
@@ -54,15 +54,16 @@ def word_embedding_space_analysis(
                     word = word[1:]
                     if word.lower() in nltk.corpus.words.words():
                         output.append(word)
-            return output[:select_words]
-        data.append([
-            ", ".join(filter_words(side_tokens))
-            for side_tokens in [left_tokens, right_tokens]
-        ])
     return pd.DataFrame(
         data,
-        columns=["One Direction", "Another Direction"],
         index=[f"Dim#{_i}" for _i in range(n_dim)],
     )
@@ -196,7 +197,7 @@ def main():
     # Analysing the sentence
     st.divider()
     st.divider()
-    st.subheader("LM-Steer Converts LMs into Text Analyzers")
     '''
     LM-Steer also serves as a probe for analyzing the text. It can be used to
     analyze the sentiment and detoxification of the text. Now, we proceed and
@@ -267,14 +268,8 @@ def main():
     embeddings: what word dimensions contribute to or contrast to a specific
     style. This analysis can be used to understand the word embedding space
     and how it steers the model's generation.
-    Note that due to the bidirectional nature of the embedding spaces, in each
-    dimension, sometimes only one side of the word embeddings contributes
-    (has an impact on the style), while the other side, (resulting in negative
-    logits) has a negligible impact on the style. The table below shows both
-    sides of the word embeddings in each dimension.
     '''
-    for dimension in ["Sentiment", "Detoxification"]:
         f'##### {dimension} Word Dimensions'
         dim = 2 if dimension == "Sentiment" else 0
         analysis_result = word_embedding_space_analysis(

     S, V, D = torch.linalg.svd(matrix)
     data = []
+    top = 50
     select_words = 20
     n_dim = 10
     for _i in range(n_dim):
                     word = word[1:]
                     if word.lower() in nltk.corpus.words.words():
                         output.append(word)
+            return output
+        left_tokens = filter_words(left_tokens)
+        right_tokens = filter_words(right_tokens)
+        if len(left_tokens) < len(right_tokens):
+            left_tokens = right_tokens
+        data.append(", ".join(left_tokens[:select_words]))
     return pd.DataFrame(
         data,
+        columns=["Words Contributing to the Style"],
         index=[f"Dim#{_i}" for _i in range(n_dim)],
     )
     # Analysing the sentence
     st.divider()
     st.divider()
+    st.subheader("LM-Steer Converts Any LM Into A Text Analyzer")
     '''
     LM-Steer also serves as a probe for analyzing the text. It can be used to
     analyze the sentiment and detoxification of the text. Now, we proceed and
     embeddings: what word dimensions contribute to or contrast to a specific
     style. This analysis can be used to understand the word embedding space
     and how it steers the model's generation.
     '''
+    for dimension in ["Detoxification", "Sentiment"]:
         f'##### {dimension} Word Dimensions'
         dim = 2 if dimension == "Sentiment" else 0
         analysis_result = word_embedding_space_analysis(