File size: 10,441 Bytes
9429ca9
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
import streamlit as st

# Page configuration
st.set_page_config(
    layout="wide", 
    initial_sidebar_state="auto"
)

# Custom CSS for better styling
st.markdown("""

    <style>

        .main-title {

            font-size: 36px;

            color: #4A90E2;

            font-weight: bold;

            text-align: center;

        }

        .sub-title {

            font-size: 24px;

            color: #4A90E2;

            margin-top: 20px;

        }

        .section {

            background-color: #f9f9f9;

            padding: 15px;

            border-radius: 10px;

            margin-top: 20px;

        }

        .section h2 {

            font-size: 22px;

            color: #4A90E2;

        }

        .section p, .section ul {

            color: #666666;

        }

        .link {

            color: #4A90E2;

            text-decoration: none;

        }

    </style>

""", unsafe_allow_html=True)

# Title
st.markdown('<div class="main-title">Automatically Answer Questions (OPEN BOOK)</div>', unsafe_allow_html=True)

# Introduction Section
st.markdown("""

<div class="section">

    <p>Open-book question answering is a task where a model generates answers based on provided text or documents. Unlike closed-book models, open-book models utilize external sources to produce responses, making them more accurate and versatile in scenarios where the input text provides essential context.</p>

    <p>This page explores how to implement an open-book question-answering pipeline using state-of-the-art NLP techniques. We use a T5 Transformer model, which is well-suited for generating detailed answers by leveraging the information contained within the input text.</p>

</div>

""", unsafe_allow_html=True)

# T5 Transformer Overview
st.markdown('<div class="sub-title">Understanding the T5 Transformer for Open-Book QA</div>', unsafe_allow_html=True)

st.markdown("""

<div class="section">

    <p>The T5 (Text-To-Text Transfer Transformer) model by Google excels in converting various NLP tasks into a unified text-to-text format. For open-book question answering, the model takes a question and relevant context as input, generating a detailed and contextually appropriate answer.</p>

    <p>The T5 model's ability to utilize provided documents makes it especially powerful in applications where the accuracy of the response is enhanced by access to supporting information, such as research tools, educational applications, or any system where the input text contains critical data.</p>

</div>

""", unsafe_allow_html=True)

# Performance Section
st.markdown('<div class="sub-title">Performance and Benchmarks</div>', unsafe_allow_html=True)

st.markdown("""

<div class="section">

    <p>In open-book settings, the T5 model has been benchmarked across various datasets, demonstrating its capability to generate accurate and comprehensive answers when given relevant context. Its performance has been particularly strong in tasks requiring a deep understanding of the input text to produce correct and context-aware responses.</p>

    <p>Open-book T5 models are especially valuable in applications that require dynamic interaction with content, making them ideal for domains such as customer support, research, and educational technologies.</p>

</div>

""", unsafe_allow_html=True)

# Implementation Section
st.markdown('<div class="sub-title">Implementing Open-Book Question Answering</div>', unsafe_allow_html=True)

st.markdown("""

<div class="section">

    <p>The following example demonstrates how to implement an open-book question answering pipeline using Spark NLP. The pipeline includes a document assembler and the T5 model to generate answers based on the input text.</p>

</div>

""", unsafe_allow_html=True)

st.code('''

from sparknlp.base import *

from sparknlp.annotator import *

from pyspark.ml import Pipeline

from pyspark.sql.functions import col, expr



document_assembler = DocumentAssembler()\\

    .setInputCol("text")\\

    .setOutputCol("documents")



t5 = T5Transformer()\\

    .pretrained(model_name)\\

    .setTask("question:")\\

    .setMaxOutputLength(200)\\

    .setInputCols(["documents"])\\

    .setOutputCol("answers")



pipeline = Pipeline().setStages([document_assembler, t5])



data = spark.createDataFrame([["What is the impact of climate change on polar bears?"]]).toDF("text")

result = pipeline.fit(data).transform(data)

result.select("answers.result").show(truncate=False)

''', language='python')

# Example Output
st.text("""

+------------------------------------------------+

|answers.result                                   |

+------------------------------------------------+

|Climate change significantly affects polar ...  |

+------------------------------------------------+

""")

# Model Info Section
st.markdown('<div class="sub-title">Choosing the Right Model for Open-Book QA</div>', unsafe_allow_html=True)

st.markdown("""

<div class="section">

    <p>When selecting a model for open-book question answering, it's important to consider the specific needs of your application. Below are some of the available models, each offering different strengths based on their transformer architecture:</p>

    <ul>

        <li><b>t5_base</b>: A versatile model that provides strong performance on question-answering tasks, ideal for applications requiring detailed answers.</li>

        <li><b>t5_small</b>: A more lightweight variant of T5, suitable for applications where resource efficiency is crucial, though it may not be as accurate as larger models.</li>

        <li><b>albert_qa_xxlarge_tweetqa</b>: Based on the ALBERT architecture, this model is fine-tuned for the TweetQA dataset, making it effective for answering questions in shorter text formats.</li>

        <li><b>bert_qa_callmenicky_finetuned_squad</b>: A fine-tuned BERT model that offers a good balance between accuracy and computational efficiency, suitable for general-purpose QA tasks.</li>

        <li><b>deberta_v3_xsmall_qa_squad2</b>: A smaller DeBERTa model, optimized for high accuracy on SQuAD2 while being resource-efficient, making it great for smaller deployments.</li>

        <li><b>distilbert_base_cased_qa_squad2</b>: A distilled version of BERT, offering faster inference times with slightly reduced accuracy, suitable for environments with limited resources.</li>

        <li><b>longformer_qa_large_4096_finetuned_triviaqa</b>: This model is particularly well-suited for open-book QA tasks involving long documents, as it can handle extended contexts effectively.</li>

        <li><b>roberta_qa_roberta_base_squad2_covid</b>: A RoBERTa-based model fine-tuned for COVID-related QA, making it highly specialized for health-related domains.</li>

        <li><b>roberta_qa_CV_Merge_DS</b>: Another RoBERTa model, fine-tuned on a diverse dataset, offering versatility across different domains and question types.</li>

        <li><b>xlm_roberta_base_qa_squad2</b>: A multilingual model fine-tuned on SQuAD2, ideal for QA tasks across various languages.</li>

    </ul>

    <p>Among these models, <b>t5_base</b> and <b>longformer_qa_large_4096_finetuned_triviaqa</b> are highly recommended for their strong performance in generating accurate and contextually rich answers, especially in scenarios with long input texts. For faster responses with an emphasis on efficiency, <b>distilbert_base_cased_qa_squad2</b> and <b>deberta_v3_xsmall_qa_squad2</b> are excellent choices. Specialized tasks may benefit from models like <b>albert_qa_xxlarge_tweetqa</b> or <b>roberta_qa_roberta_base_squad2_covid</b>, depending on the domain.</p>

    <p>Explore the available models on the <a class="link" href="https://sparknlp.org/models?annotator=T5Transformer" target="_blank">Spark NLP Models Hub</a> to find the one that best suits your needs.</p>

</div>

""", unsafe_allow_html=True)


# Footer
# References Section
st.markdown('<div class="sub-title">References</div>', unsafe_allow_html=True)

st.markdown("""

<div class="section">

    <ul>

        <li><a class="link" href="https://ai.googleblog.com/2020/02/exploring-transfer-learning-with-t5.html" target="_blank">Google AI Blog</a>: Exploring Transfer Learning with T5</li>

        <li><a class="link" href="https://sparknlp.org/models?annotator=T5Transformer" target="_blank">Spark NLP Model Hub</a>: Explore T5 models</li>

        <li><a class="link" href="https://github.com/google-research/text-to-text-transfer-transformer" target="_blank">GitHub</a>: T5 Transformer repository</li>

        <li><a class="link" href="https://arxiv.org/abs/1910.10683" target="_blank">T5 Paper</a>: Detailed insights from the developers</li>

    </ul>

</div>

""", unsafe_allow_html=True)

st.markdown('<div class="sub-title">Community & Support</div>', unsafe_allow_html=True)

st.markdown("""

<div class="section">

    <ul>

        <li><a class="link" href="https://sparknlp.org/" target="_blank">Official Website</a>: Documentation and examples</li>

        <li><a class="link" href="https://join.slack.com/t/spark-nlp/shared_invite/zt-198dipu77-L3UWNe_AJ8xqDk0ivmih5Q" target="_blank">Slack</a>: Live discussion with the community and team</li>

        <li><a class="link" href="https://github.com/JohnSnowLabs/spark-nlp" target="_blank">GitHub</a>: Bug reports, feature requests, and contributions</li>

        <li><a class="link" href="https://medium.com/spark-nlp" target="_blank">Medium</a>: Spark NLP articles</li>

        <li><a class="link" href="https://www.youtube.com/channel/UCmFOjlpYEhxf_wJUDuz6xxQ/videos" target="_blank">YouTube</a>: Video tutorials</li>

    </ul>

</div>

""", unsafe_allow_html=True)

st.markdown('<div class="sub-title">Quick Links</div>', unsafe_allow_html=True)

st.markdown("""

<div class="section">

    <ul>

        <li><a class="link" href="https://sparknlp.org/docs/en/quickstart" target="_blank">Getting Started</a></li>

        <li><a class="link" href="https://nlp.johnsnowlabs.com/models" target="_blank">Pretrained Models</a></li>

        <li><a class="link" href="https://github.com/JohnSnowLabs/spark-nlp/tree/master/examples/python/annotation/text/english" target="_blank">Example Notebooks</a></li>

        <li><a class="link" href="https://sparknlp.org/docs/en/install" target="_blank">Installation Guide</a></li>

    </ul>

</div>

""", unsafe_allow_html=True)