File size: 8,648 Bytes
5da5ab8
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
import streamlit as st

# Page configuration
st.set_page_config(
    layout="wide", 
    initial_sidebar_state="auto"
)

# Custom CSS for better styling
st.markdown("""

    <style>

        .main-title {

            font-size: 36px;

            color: #4A90E2;

            font-weight: bold;

            text-align: center;

        }

        .sub-title {

            font-size: 24px;

            color: #4A90E2;

            margin-top: 20px;

        }

        .section {

            background-color: #f9f9f9;

            padding: 15px;

            border-radius: 10px;

            margin-top: 20px;

        }

        .section h2 {

            font-size: 22px;

            color: #4A90E2;

        }

        .section p, .section ul {

            color: #666666;

        }

        .link {

            color: #4A90E2;

            text-decoration: none;

        }

    </style>

""", unsafe_allow_html=True)

# Title
st.markdown('<div class="main-title">Automatically Answer Questions (CLOSED BOOK)</div>', unsafe_allow_html=True)

# Introduction Section
st.markdown("""

<div class="section">

    <p>Closed-book question answering is a challenging task where a model is expected to generate accurate answers to questions without access to external information or documents during inference. This approach relies solely on the pre-trained knowledge embedded within the model, making it ideal for scenarios where retrieval-based methods are not feasible.</p>

    <p>In this page, we will explore how to implement a pipeline that can automatically answer questions in a closed-book setting using state-of-the-art NLP techniques. We utilize a T5 Transformer model fine-tuned for closed-book question answering, providing accurate and contextually relevant answers to a variety of trivia questions.</p>

</div>

""", unsafe_allow_html=True)

# T5 Transformer Overview
st.markdown('<div class="sub-title">Understanding the T5 Transformer for Closed-Book QA</div>', unsafe_allow_html=True)

st.markdown("""

<div class="section">

    <p>The T5 (Text-To-Text Transfer Transformer) model by Google is a versatile transformer-based model designed to handle a wide range of NLP tasks in a unified text-to-text format. For closed-book question answering, T5 is fine-tuned to generate answers directly from its internal knowledge without relying on external sources.</p>

    <p>The model processes input questions and, based on its training, generates a text response that is both relevant and accurate. This makes it particularly effective in applications where access to external data sources is limited or impractical.</p>

</div>

""", unsafe_allow_html=True)

# Performance Section
st.markdown('<div class="sub-title">Performance and Benchmarks</div>', unsafe_allow_html=True)

st.markdown("""

<div class="section">

    <p>The T5 model has been extensively benchmarked on various question-answering datasets, including natural questions and trivia challenges. In these evaluations, the closed-book variant of T5 has shown strong performance, often producing answers that are correct and contextually appropriate, even when the model is not allowed to reference any external data.</p>

    <p>This makes the T5 model a powerful tool for generating answers in applications such as virtual assistants, educational tools, and any scenario where pre-trained knowledge is sufficient to provide responses.</p>

</div>

""", unsafe_allow_html=True)

# Implementation Section
st.markdown('<div class="sub-title">Implementing Closed-Book Question Answering</div>', unsafe_allow_html=True)

st.markdown("""

<div class="section">

    <p>The following example demonstrates how to implement a closed-book question answering pipeline using Spark NLP. The pipeline includes a document assembler, a sentence detector to identify questions, and the T5 model to generate answers.</p>

</div>

""", unsafe_allow_html=True)

st.code('''

from sparknlp.base import *

from sparknlp.annotator import *

from pyspark.ml import Pipeline

from pyspark.sql.functions import col, expr



document_assembler = DocumentAssembler()\\

    .setInputCol("text")\\

    .setOutputCol("documents")



sentence_detector = SentenceDetectorDLModel\\

    .pretrained("sentence_detector_dl", "en")\\

    .setInputCols(["documents"])\\

    .setOutputCol("questions")

    

t5 = T5Transformer()\\

    .pretrained("google_t5_small_ssm_nq")\\

    .setTask('trivia question:')\\

    .setInputCols(["questions"])\\

    .setOutputCol("answers")

    

pipeline = Pipeline().setStages([document_assembler, sentence_detector, t5])



data = spark.createDataFrame([["What is the capital of France?"]]).toDF("text")

result = pipeline.fit(data).transform(data)

result.select("answers.result").show(truncate=False)

''', language='python')

# Example Output
st.text("""

+---------------------------+

|answers.result              |

+---------------------------+

|[Paris]                    |

+---------------------------+

""")

# Model Info Section
st.markdown('<div class="sub-title">Choosing the Right T5 Model</div>', unsafe_allow_html=True)

st.markdown("""

<div class="section">

    <p>Several T5 models are available, each pre-trained on different datasets and tasks. For closed-book question answering, it's important to select a model that has been fine-tuned specifically for this task. The model used in the example, "google_t5_small_ssm_nq," is optimized for answering trivia questions in a closed-book setting.</p>

    <p>For more complex or varied question-answering tasks, consider using larger T5 models like T5-Base or T5-Large, which may offer improved accuracy and context comprehension. Explore the available models on the <a class="link" href="https://sparknlp.org/models?annotator=T5Transformer" target="_blank">Spark NLP Models Hub</a> to find the best fit for your application.</p>

</div>

""", unsafe_allow_html=True)

# Footer
# References Section
st.markdown('<div class="sub-title">References</div>', unsafe_allow_html=True)

st.markdown("""

<div class="section">

    <ul>

        <li><a class="link" href="https://ai.googleblog.com/2020/02/exploring-transfer-learning-with-t5.html" target="_blank">Google AI Blog</a>: Exploring Transfer Learning with T5</li>

        <li><a class="link" href="https://sparknlp.org/models?annotator=T5Transformer" target="_blank">Spark NLP Model Hub</a>: Explore T5 models</li>

        <li>Model used: <a class="link" href="https://sparknlp.org/2022/05/31/google_t5_small_ssm_nq_en_3_0.html" target="_blank">google_t5_small_ssm_nq</a></li>

        <li><a class="link" href="https://github.com/google-research/text-to-text-transfer-transformer" target="_blank">GitHub</a>: T5 Transformer repository</li>

        <li><a class="link" href="https://arxiv.org/abs/1910.10683" target="_blank">T5 Paper</a>: Detailed insights from the developers</li>

    </ul>

</div>

""", unsafe_allow_html=True)

st.markdown('<div class="sub-title">Community & Support</div>', unsafe_allow_html=True)

st.markdown("""

<div class="section">

    <ul>

        <li><a class="link" href="https://sparknlp.org/" target="_blank">Official Website</a>: Documentation and examples</li>

        <li><a class="link" href="https://join.slack.com/t/spark-nlp/shared_invite/zt-198dipu77-L3UWNe_AJ8xqDk0ivmih5Q" target="_blank">Slack</a>: Live discussion with the community and team</li>

        <li><a class="link" href="https://github.com/JohnSnowLabs/spark-nlp" target="_blank">GitHub</a>: Bug reports, feature requests, and contributions</li>

        <li><a class="link" href="https://medium.com/spark-nlp" target="_blank">Medium</a>: Spark NLP articles</li>

        <li><a class="link" href="https://www.youtube.com/channel/UCmFOjlpYEhxf_wJUDuz6xxQ/videos" target="_blank">YouTube</a>: Video tutorials</li>

    </ul>

</div>

""", unsafe_allow_html=True)

st.markdown('<div class="sub-title">Quick Links</div>', unsafe_allow_html=True)

st.markdown("""

<div class="section">

    <ul>

        <li><a class="link" href="https://sparknlp.org/docs/en/quickstart" target="_blank">Getting Started</a></li>

        <li><a class="link" href="https://nlp.johnsnowlabs.com/models" target="_blank">Pretrained Models</a></li>

        <li><a class="link" href="https://github.com/JohnSnowLabs/spark-nlp/tree/master/examples/python/annotation/text/english" target="_blank">Example Notebooks</a></li>

        <li><a class="link" href="https://sparknlp.org/docs/en/install" target="_blank">Installation Guide</a></li>

    </ul>

</div>

""", unsafe_allow_html=True)