File size: 5,490 Bytes
8744085
 
 
 
 
 
a66b528
8744085
 
 
a66b528
 
 
 
 
 
 
 
 
 
 
 
 
 
 
8744085
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
import streamlit as st
from src.configs import Languages


def write(*args):

    # ==== HOW IT WORKS ==== #
    with st.beta_container():
        st.markdown("")
        st.markdown("")
        st.markdown(
            """
            Wordify makes it easy to identify words that discriminate categories in textual data.

            Let's explain Wordify with an example. Imagine you are thinking about having a glass
            of wine :wine_glass: with your friends :man-man-girl-girl: and you have to buy a bottle.
            You know you like `bold`, `woody` wine but are unsure which one to choose.
            You wonder whether there are some words that describe each type of wine.
            Since you are a researcher :female-scientist: :male-scientist:, you decide to approach
            the problem scientifically :microscope:. That's where Wordify comes to the rescue!
            """
        )
        st.markdown("")
        st.markdown("")
        st.header("Steps")
        st.subheader("Step 1 - Prepare your data")
        st.markdown(
            """
            Create an Excel or CSV file with two columns for each row:

            - a column with the name or the label identifying a specific object or class (e.g., in our
            wine example above it would be the type of wine or the name of a specific brand). It is
            common practice naming this column `label`

            - a column with the text describing that specific object or class (e.g., in the wine example
            above it could be the description that you find on the rear of the bottle label). It is
            common practice naming this column `text`

            To have reliable results, we suggest providing at least 2000 labelled texts. If you provide
            less we will still wordify your file, but the results should then be taken with a grain of
            salt.

            Consider that we also support multi-language texts, therefore you'll be able to
            automatically discriminate between international wines, even if your preferred Italian
            producer does not provide you with a description written in English!
            """
        )

        st.subheader("Step 2 - Upload your file and Wordify!")
        st.markdown(
            """
            Once you have prepared your Excel or CSV file, click the "Browse File" button.
            Browse for your file.
            Choose the language of your texts (select multi-language if your file contains text in
            different languages).
            Push the "Wordify|" button, set back, and wait for wordify to do its tricks.

            Depending on the size of your data, the process can take from 1 minute to 5 minutes
            """
        )

    # ==== FAQ ==== #
    with st.beta_container():
        st.markdown("")
        st.markdown("")
        st.header(":question:Frequently Asked Questions")
        with st.beta_expander("What is Wordify?"):
            st.markdown(
                """
                Wordify is a way to find out which terms are most indicative for each of your dependent
                variable values.
                """
            )

        with st.beta_expander("What happens to my data?"):
            st.markdown(
                """
                Nothing. We never store the data you upload on disk: it is only kept in memory for the
                duration of the modeling, and then deleted. We do not retain any copies or traces of
                your data.
                """
            )

        with st.beta_expander("What input formats do you support?"):
            st.markdown(
                """
                The file you upload should be .xlsx, with two columns: the first should be labeled
                'text' and contain all your documents (e.g., tweets, reviews, patents, etc.), one per
                line. The second column should be labeled 'label', and contain the dependent variable
                label associated with each text (e.g., rating, author gender, company, etc.).
                """
            )

        with st.beta_expander("How does it work?"):
            st.markdown(
                """
                It uses a variant of the Stability Selection algorithm
                [(Meinshausen and Bühlmann, 2010)](https://rss.onlinelibrary.wiley.com/doi/full/10.1111/j.1467-9868.2010.00740.x)
                to fit hundreds of logistic regression models on random subsets of the data, using
                different L1 penalties to drive as many of the term coefficients to 0. Any terms that
                receive a non-zero coefficient in at least 30% of all model runs can be seen as stable
                indicators.
                """
            )

        with st.beta_expander("How much data do I need?"):
            st.markdown(
                """
                We recommend at least 2000 instances, the more, the better. With fewer instances, the
                results are less replicable and reliable.
                """
            )

        with st.beta_expander("Is there a paper I can cite?"):
            st.markdown(
                """
                Yes please! Reference coming soon...
                """
            )

        with st.beta_expander("What languages are supported?"):
            st.markdown(
                f"""
                Currently we support: {", ".join([i.name for i in Languages])}.
                """
            )