Spaces:
Sleeping
Sleeping
Update src/about.py
Browse files- src/about.py +10 -10
src/about.py
CHANGED
@@ -38,13 +38,13 @@ Welcome to the leaderboard of the MindShift!
|
|
38 |
|
39 |
Have you ever wondered how you can measure how much your LLM is following the role it has been given? Or how depressed or optimistic it is?
|
40 |
|
41 |
-
For this purpose, we offer you a handy tool - 🏆MindShift
|
42 |
|
43 |
-
🏆MindShift - is a benchmark for assessing the psychological susceptibility of LLMs, such as perception, recognition and role performance with psychological characteristics. It is based on an AI model adaptation of the human psychometric person-oriented test (Minnesota Multiphasic Personality Inventory (MMPI)).
|
44 |
|
45 |
It is easy to use and can assess any LLM - both instructively tuned and in its basic version. Its scales, which are easily interpreted by humans, allow you to choose the appropriate language model for your conversational assistant or a game NPC.
|
46 |
|
47 |
-
🤗More details on the measurement approach, roles and psychological biases can be found in the
|
48 |
"""
|
49 |
|
50 |
# Which evaluations are you running? how can people reproduce what you have?
|
@@ -52,7 +52,7 @@ LLM_BENCHMARKS_TEXT = f"""
|
|
52 |
Large language models (LLMs) hold the potential to absorb and reflect personality traits and attitudes specified by users.
|
53 |
|
54 |
<div style='display: flex; align-items: center; justify-content: center; text-align: center;'>
|
55 |
-
<img src='https://github.com/IrinaArmstrong/MindShift/blob/master/figs/mindshift-concept.png' style='width: 600px; height: auto; margin-right: 10px;' />
|
56 |
</div>
|
57 |
|
58 |
## How it works?
|
@@ -61,7 +61,7 @@ Large language models (LLMs) hold the potential to absorb and reflect personalit
|
|
61 |
To reliably validate the implicit understanding of psychological personality traits in LLMs, it is crucial to adapt psychological interpretations of the scales and formulate questions specific to the language models. When asked explicit questions about inner worlds, morality, and behavioral patterns, LLMs may exhibit biased behaviors due to extensive alignment tuning. This can result in inconsistent and unrepresentative questionnaire outcomes.
|
62 |
|
63 |
To assess the susceptibility of LLMs to personalization, we utilized the Standardized Multifactorial Method for Personality Research (SMMPR), which is based on the Minnesota Multiphasic Personality Inventory (MMPI). It is a questionnaire-based test consisting of 566 short statements that individuals rate as true or false for themselves.
|
64 |
-
The test assesses psychological characteristics on 10 basic "personality profile" scales
|
65 |
* Hypochondria (Hs),
|
66 |
* Depression (D),
|
67 |
* Emotional Lability (Hy),
|
@@ -73,27 +73,27 @@ The test assesses psychological characteristics on 10 basic "personality profile
|
|
73 |
* Optimism (Ma),
|
74 |
* Social Introversion (Si).
|
75 |
|
76 |
-
Additionally, the test includes three validation scales to assess the truthfulness and sincerity of the respondent's answers: Lie (L), Infrequency (F), and Defensiveness (D).
|
77 |
|
78 |
To ensure the reproducibility of our methodology for both instructively tuned and basic versions, we leveraged the LLM's ability to complete textual queries. We constructed a set of statements from the questionnaire and asked LLM to finish the prompt with only one option: True or False.
|
79 |
|
80 |
<div style='display: flex; align-items: center; justify-content: center; text-align: center;'>
|
81 |
-
<img src='https://github.com/IrinaArmstrong/MindShift/blob/master/figs/mindshift-statements.png' style='width: 600px; height: auto; margin-right: 10px;' />
|
82 |
</div>
|
83 |
|
84 |
### Psychological prompts
|
85 |
|
86 |
To measure the extent to which an LLM understands personality, MindShift at its core contains a structured method for introducing psychologically oriented biases into prompts.
|
87 |
-
Introducing specific personality traits into an LLM can be achieved by providing it with a natural language description of the persona. In our methodology, the persona description consists of two parts: the Persona General Descriptor and the Psychological Bias Descriptor
|
88 |
|
89 |
<div style='display: flex; align-items: center; justify-content: center; text-align: center;'>
|
90 |
-
<img src='https://github.com/IrinaArmstrong/MindShift/blob/master/figs/mindshift-input-schema.png' style='width: 600px; height: auto; margin-right: 10px;' />
|
91 |
</div>
|
92 |
|
93 |
They are combined with Persona General Descriptor - a full character role (including gender, age, marital status, personal circumstances, hobbies, etc.), sampled from PersonaChat dialogue dataset. Together they form a complete description of the persona.
|
94 |
|
95 |
### Paper
|
96 |
-
You can find more details about the assessment, a list of psychological prompts, roles and experiments in the paper (🚀coming soon!).
|
97 |
"""
|
98 |
|
99 |
EVALUATION_QUEUE_TEXT = """
|
|
|
38 |
|
39 |
Have you ever wondered how you can measure how much your LLM is following the role it has been given? Or how depressed or optimistic it is?
|
40 |
|
41 |
+
For this purpose, we offer you a handy tool - 🏆 **MindShift**.
|
42 |
|
43 |
+
🏆 **MindShift** - is a benchmark for assessing the psychological susceptibility of LLMs, such as perception, recognition and role performance with psychological characteristics. It is based on an AI model adaptation of the human psychometric person-oriented test (Minnesota Multiphasic Personality Inventory (MMPI)).
|
44 |
|
45 |
It is easy to use and can assess any LLM - both instructively tuned and in its basic version. Its scales, which are easily interpreted by humans, allow you to choose the appropriate language model for your conversational assistant or a game NPC.
|
46 |
|
47 |
+
🤗 More details on the measurement approach, roles and psychological biases can be found in the ``📝 About`` tab. See also the paper (🚀coming soon!).
|
48 |
"""
|
49 |
|
50 |
# Which evaluations are you running? how can people reproduce what you have?
|
|
|
52 |
Large language models (LLMs) hold the potential to absorb and reflect personality traits and attitudes specified by users.
|
53 |
|
54 |
<div style='display: flex; align-items: center; justify-content: center; text-align: center;'>
|
55 |
+
<img src='https://github.com/IrinaArmstrong/MindShift/blob/master/figs/mindshift-concept.png?raw=true' style='width: 600px; height: auto; margin-right: 10px;' />
|
56 |
</div>
|
57 |
|
58 |
## How it works?
|
|
|
61 |
To reliably validate the implicit understanding of psychological personality traits in LLMs, it is crucial to adapt psychological interpretations of the scales and formulate questions specific to the language models. When asked explicit questions about inner worlds, morality, and behavioral patterns, LLMs may exhibit biased behaviors due to extensive alignment tuning. This can result in inconsistent and unrepresentative questionnaire outcomes.
|
62 |
|
63 |
To assess the susceptibility of LLMs to personalization, we utilized the Standardized Multifactorial Method for Personality Research (SMMPR), which is based on the Minnesota Multiphasic Personality Inventory (MMPI). It is a questionnaire-based test consisting of 566 short statements that individuals rate as true or false for themselves.
|
64 |
+
The test assesses psychological characteristics on **10 basic "personality profile" scales**, named after the nosological forms of corresponding disorders:
|
65 |
* Hypochondria (Hs),
|
66 |
* Depression (D),
|
67 |
* Emotional Lability (Hy),
|
|
|
73 |
* Optimism (Ma),
|
74 |
* Social Introversion (Si).
|
75 |
|
76 |
+
Additionally, the test includes **three validation scales** to assess the truthfulness and sincerity of the respondent's answers: Lie (L), Infrequency (F), and Defensiveness (D).
|
77 |
|
78 |
To ensure the reproducibility of our methodology for both instructively tuned and basic versions, we leveraged the LLM's ability to complete textual queries. We constructed a set of statements from the questionnaire and asked LLM to finish the prompt with only one option: True or False.
|
79 |
|
80 |
<div style='display: flex; align-items: center; justify-content: center; text-align: center;'>
|
81 |
+
<img src='https://github.com/IrinaArmstrong/MindShift/blob/master/figs/mindshift-statements.png?raw=true' style='width: 600px; height: auto; margin-right: 10px;' />
|
82 |
</div>
|
83 |
|
84 |
### Psychological prompts
|
85 |
|
86 |
To measure the extent to which an LLM understands personality, MindShift at its core contains a structured method for introducing psychologically oriented biases into prompts.
|
87 |
+
Introducing specific personality traits into an LLM can be achieved by providing it with a natural language description of the persona. In our methodology, the persona description consists of two parts: **the Persona General Descriptor** and the **Psychological Bias Descriptor**. The **Persona General Descriptor** includes general statements about the character's lifestyle, routines, and social aspects, while the **Psychological Bias Descriptor** covers specific psychological attitudes with varying degrees of intensity.
|
88 |
|
89 |
<div style='display: flex; align-items: center; justify-content: center; text-align: center;'>
|
90 |
+
<img src='https://github.com/IrinaArmstrong/MindShift/blob/master/figs/mindshift-input-schema.png?raw=true' style='width: 600px; height: auto; margin-right: 10px;' />
|
91 |
</div>
|
92 |
|
93 |
They are combined with Persona General Descriptor - a full character role (including gender, age, marital status, personal circumstances, hobbies, etc.), sampled from PersonaChat dialogue dataset. Together they form a complete description of the persona.
|
94 |
|
95 |
### Paper
|
96 |
+
You can find more details about the assessment, a list of psychological prompts, roles and experiments in the paper (_🚀 coming soon!_).
|
97 |
"""
|
98 |
|
99 |
EVALUATION_QUEUE_TEXT = """
|