Spaces:

spark-nlp
/

sparknlp-t5-closed-book-question-answering

Sleeping

sparknlp-t5-closed-book-question-answering

File size: 8,648 Bytes

5da5ab8

import streamlit as st

# Page configuration
st.set_page_config(
    layout="wide", 
    initial_sidebar_state="auto"
)

# Custom CSS for better styling
st.markdown("""

    <style>

        .main-title {

            font-size: 36px;

            color: #4A90E2;

            font-weight: bold;

            text-align: center;

        }

        .sub-title {

            font-size: 24px;

            color: #4A90E2;

            margin-top: 20px;

        }

        .section {

            background-color: #f9f9f9;

            padding: 15px;

            border-radius: 10px;

            margin-top: 20px;

        }

        .section h2 {

            font-size: 22px;

            color: #4A90E2;

        }

        .section p, .section ul {

            color: #666666;

        }

        .link {

            color: #4A90E2;

            text-decoration: none;

        }

    </style>

""", unsafe_allow_html=True)

# Title
st.markdown('<div class="main-title">Automatically Answer Questions (CLOSED BOOK)</div>', unsafe_allow_html=True)

# Introduction Section
st.markdown("""

<div class="section">

    <p>Closed-book question answering is a challenging task where a model is expected to generate accurate answers to questions without access to external information or documents during inference. This approach relies solely on the pre-trained knowledge embedded within the model, making it ideal for scenarios where retrieval-based methods are not feasible.</p>

    <p>In this page, we will explore how to implement a pipeline that can automatically answer questions in a closed-book setting using state-of-the-art NLP techniques. We utilize a T5 Transformer model fine-tuned for closed-book question answering, providing accurate and contextually relevant answers to a variety of trivia questions.</p>

</div>

""", unsafe_allow_html=True)

# T5 Transformer Overview
st.markdown('<div class="sub-title">Understanding the T5 Transformer for Closed-Book QA</div>', unsafe_allow_html=True)

st.markdown("""

<div class="section">

    <p>The T5 (Text-To-Text Transfer Transformer) model by Google is a versatile transformer-based model designed to handle a wide range of NLP tasks in a unified text-to-text format. For closed-book question answering, T5 is fine-tuned to generate answers directly from its internal knowledge without relying on external sources.</p>

    <p>The model processes input questions and, based on its training, generates a text response that is both relevant and accurate. This makes it particularly effective in applications where access to external data sources is limited or impractical.</p>

</div>

""", unsafe_allow_html=True)

# Performance Section
st.markdown('<div class="sub-title">Performance and Benchmarks</div>', unsafe_allow_html=True)

st.markdown("""

<div class="section">

    <p>The T5 model has been extensively benchmarked on various question-answering datasets, including natural questions and trivia challenges. In these evaluations, the closed-book variant of T5 has shown strong performance, often producing answers that are correct and contextually appropriate, even when the model is not allowed to reference any external data.</p>

    <p>This makes the T5 model a powerful tool for generating answers in applications such as virtual assistants, educational tools, and any scenario where pre-trained knowledge is sufficient to provide responses.</p>

</div>

""", unsafe_allow_html=True)

# Implementation Section
st.markdown('<div class="sub-title">Implementing Closed-Book Question Answering</div>', unsafe_allow_html=True)

st.markdown("""

<div class="section">

    <p>The following example demonstrates how to implement a closed-book question answering pipeline using Spark NLP. The pipeline includes a document assembler, a sentence detector to identify questions, and the T5 model to generate answers.</p>

</div>

""", unsafe_allow_html=True)

st.code('''

from sparknlp.base import *

from sparknlp.annotator import *

from pyspark.ml import Pipeline

from pyspark.sql.functions import col, expr



document_assembler = DocumentAssembler()\\

    .setInputCol("text")\\

    .setOutputCol("documents")



sentence_detector = SentenceDetectorDLModel\\

    .pretrained("sentence_detector_dl", "en")\\

    .setInputCols(["documents"])\\

    .setOutputCol("questions")

    

t5 = T5Transformer()\\

    .pretrained("google_t5_small_ssm_nq")\\

    .setTask('trivia question:')\\

    .setInputCols(["questions"])\\

    .setOutputCol("answers")

    

pipeline = Pipeline().setStages([document_assembler, sentence_detector, t5])



data = spark.createDataFrame([["What is the capital of France?"]]).toDF("text")

result = pipeline.fit(data).transform(data)

result.select("answers.result").show(truncate=False)

''', language='python')

# Example Output
st.text("""

+---------------------------+

|answers.result              |

+---------------------------+

|[Paris]                    |

+---------------------------+

""")

# Model Info Section
st.markdown('<div class="sub-title">Choosing the Right T5 Model</div>', unsafe_allow_html=True)

st.markdown("""

<div class="section">

    <p>Several T5 models are available, each pre-trained on different datasets and tasks. For closed-book question answering, it's important to select a model that has been fine-tuned specifically for this task. The model used in the example, "google_t5_small_ssm_nq," is optimized for answering trivia questions in a closed-book setting.</p>

    <p>For more complex or varied question-answering tasks, consider using larger T5 models like T5-Base or T5-Large, which may offer improved accuracy and context comprehension. Explore the available models on the <a class="link" href="https://sparknlp.org/models?annotator=T5Transformer" target="_blank">Spark NLP Models Hub</a> to find the best fit for your application.</p>

</div>

""", unsafe_allow_html=True)

# Footer
# References Section
st.markdown('<div class="sub-title">References</div>', unsafe_allow_html=True)

st.markdown("""

<div class="section">

    <ul>

        <li><a class="link" href="https://ai.googleblog.com/2020/02/exploring-transfer-learning-with-t5.html" target="_blank">Google AI Blog</a>: Exploring Transfer Learning with T5</li>

        <li><a class="link" href="https://sparknlp.org/models?annotator=T5Transformer" target="_blank">Spark NLP Model Hub</a>: Explore T5 models</li>

        <li>Model used: <a class="link" href="https://sparknlp.org/2022/05/31/google_t5_small_ssm_nq_en_3_0.html" target="_blank">google_t5_small_ssm_nq</a></li>

        <li><a class="link" href="https://github.com/google-research/text-to-text-transfer-transformer" target="_blank">GitHub</a>: T5 Transformer repository</li>

        <li><a class="link" href="https://arxiv.org/abs/1910.10683" target="_blank">T5 Paper</a>: Detailed insights from the developers</li>

    </ul>

</div>

""", unsafe_allow_html=True)

st.markdown('<div class="sub-title">Community & Support</div>', unsafe_allow_html=True)

st.markdown("""

<div class="section">

    <ul>

        <li><a class="link" href="https://sparknlp.org/" target="_blank">Official Website</a>: Documentation and examples</li>

        <li><a class="link" href="https://join.slack.com/t/spark-nlp/shared_invite/zt-198dipu77-L3UWNe_AJ8xqDk0ivmih5Q" target="_blank">Slack</a>: Live discussion with the community and team</li>

        <li><a class="link" href="https://github.com/JohnSnowLabs/spark-nlp" target="_blank">GitHub</a>: Bug reports, feature requests, and contributions</li>

        <li><a class="link" href="https://medium.com/spark-nlp" target="_blank">Medium</a>: Spark NLP articles</li>

        <li><a class="link" href="https://www.youtube.com/channel/UCmFOjlpYEhxf_wJUDuz6xxQ/videos" target="_blank">YouTube</a>: Video tutorials</li>

    </ul>

</div>

""", unsafe_allow_html=True)

st.markdown('<div class="sub-title">Quick Links</div>', unsafe_allow_html=True)

st.markdown("""

<div class="section">

    <ul>

        <li><a class="link" href="https://sparknlp.org/docs/en/quickstart" target="_blank">Getting Started</a></li>

        <li><a class="link" href="https://nlp.johnsnowlabs.com/models" target="_blank">Pretrained Models</a></li>

        <li><a class="link" href="https://github.com/JohnSnowLabs/spark-nlp/tree/master/examples/python/annotation/text/english" target="_blank">Example Notebooks</a></li>

        <li><a class="link" href="https://sparknlp.org/docs/en/install" target="_blank">Installation Guide</a></li>

    </ul>

</div>

""", unsafe_allow_html=True)