File size: 4,723 Bytes
8744085
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
import streamlit as st
from src.configs import Languages


def write(*args):

    # ==== HPW IT WORKS ==== #
    with st.beta_container():
        st.markdown("")
        st.markdown("")
        st.header("How it works")
        st.subheader("Step 1 - Prepare your data")
        st.markdown(
            """
            Create an Excel or CSV file with two columns for each row:

            - a column with the name or the label identifying a specific object or class (e.g., in our
            wine example above it would be the type of wine or the name of a specific brand). It is
            common practice naming this column `label`

            - a column with the text describing that specific object or class (e.g., in the wine example
            above it could be the description that you find on the rear of the bottle label). It is
            common practice naming this column `text`

            To have reliable results, we suggest providing at least 2000 labelled texts. If you provide
            less we will still wordify your file, but the results should then be taken with a grain of
            salt.

            Consider that we also support multi-language texts, therefore you'll be able to
            automatically discriminate between international wines, even if your preferred Italian
            producer does not provide you with a description written in English!
            """
        )

        st.subheader("Step 2 - Upload your file and Wordify!")
        st.markdown(
            """
            Once you have prepared your Excel or CSV file, click the "Browse File" button.
            Browse for your file.
            Choose the language of your texts (select multi-language if your file contains text in
            different languages).
            Push the "Wordify|" button, set back, and wait for wordify to do its tricks.

            Depending on the size of your data, the process can take from 1 minute to 5 minutes
            """
        )

    # ==== FAQ ==== #
    with st.beta_container():
        st.markdown("")
        st.markdown("")
        st.header(":question:Frequently Asked Questions")
        with st.beta_expander("What is Wordify?"):
            st.markdown(
                """
                Wordify is a way to find out which terms are most indicative for each of your dependent
                variable values.
                """
            )

        with st.beta_expander("What happens to my data?"):
            st.markdown(
                """
                Nothing. We never store the data you upload on disk: it is only kept in memory for the
                duration of the modeling, and then deleted. We do not retain any copies or traces of
                your data.
                """
            )

        with st.beta_expander("What input formats do you support?"):
            st.markdown(
                """
                The file you upload should be .xlsx, with two columns: the first should be labeled
                'text' and contain all your documents (e.g., tweets, reviews, patents, etc.), one per
                line. The second column should be labeled 'label', and contain the dependent variable
                label associated with each text (e.g., rating, author gender, company, etc.).
                """
            )

        with st.beta_expander("How does it work?"):
            st.markdown(
                """
                It uses a variant of the Stability Selection algorithm
                [(Meinshausen and Bühlmann, 2010)](https://rss.onlinelibrary.wiley.com/doi/full/10.1111/j.1467-9868.2010.00740.x)
                to fit hundreds of logistic regression models on random subsets of the data, using
                different L1 penalties to drive as many of the term coefficients to 0. Any terms that
                receive a non-zero coefficient in at least 30% of all model runs can be seen as stable
                indicators.
                """
            )

        with st.beta_expander("How much data do I need?"):
            st.markdown(
                """
                We recommend at least 2000 instances, the more, the better. With fewer instances, the
                results are less replicable and reliable.
                """
            )

        with st.beta_expander("Is there a paper I can cite?"):
            st.markdown(
                """
                Yes please! Reference coming soon...
                """
            )

        with st.beta_expander("What languages are supported?"):
            st.markdown(
                f"""
                Currently we support: {", ".join([i.name for i in Languages])}.
                """
            )