Spaces:

spark-nlp
/

XLNet

Sleeping

File size: 10,511 Bytes

876127f

import streamlit as st

# Page configuration
st.set_page_config(
    layout="wide", 
    initial_sidebar_state="auto"
)

# Custom CSS for better styling
st.markdown("""

    <style>

        .main-title {

            font-size: 36px;

            color: #4A90E2;

            font-weight: bold;

            text-align: center;

        }

        .sub-title {

            font-size: 24px;

            color: #4A90E2;

            margin-top: 20px;

        }

        .section {

            background-color: #f9f9f9;

            padding: 15px;

            border-radius: 10px;

            margin-top: 20px;

        }

        .section h2 {

            font-size: 22px;

            color: #4A90E2;

        }

        .section p, .section ul {

            color: #666666;

        }

        .link {

            color: #4A90E2;

            text-decoration: none;

        }

        .benchmark-table {

            width: 100%;

            border-collapse: collapse;

            margin-top: 20px;

        }

        .benchmark-table th, .benchmark-table td {

            border: 1px solid #ddd;

            padding: 8px;

            text-align: left;

        }

        .benchmark-table th {

            background-color: #4A90E2;

            color: white;

        }

        .benchmark-table td {

            background-color: #f2f2f2;

        }

    </style>

""", unsafe_allow_html=True)

# Title
st.markdown('<div class="main-title">Introduction to XLNet for Token & Sequence Classification in Spark NLP</div>', unsafe_allow_html=True)

# Subtitle
st.markdown("""

<div class="section">

    <p>XLNet is a powerful transformer-based language model that excels in handling various Natural Language Processing (NLP) tasks. It uses a permutation-based training approach, which allows it to capture bidirectional context, making it highly effective for tasks like token classification and sequence classification.</p>

</div>

""", unsafe_allow_html=True)

# Tabs for XLNet Annotators
tab1, tab2 = st.tabs(["XlnetForTokenClassification", "XlnetForSequenceClassification"])

# Tab 1: XlnetForTokenClassification
with tab1:
    st.markdown("""

    <div class="section">

        <h2>XLNet for Token Classification</h2>

        <p><strong>Token Classification</strong> involves assigning labels to individual tokens (words or subwords) within a sentence. This is crucial for tasks such as Named Entity Recognition (NER), where each token is classified as a specific entity like a person, organization, or location.</p>

        <p>XLNet, with its robust contextual understanding, is particularly suited for token classification tasks. Its permutation-based training enables the model to capture dependencies across different parts of a sentence, improving accuracy in token-level predictions.</p>

        <p>Using XLNet for token classification enables:</p>

        <ul>

            <li><strong>Accurate NER:</strong> Extract entities from text with high precision.</li>

            <li><strong>Contextual Understanding:</strong> Benefit from XLNet's ability to model bidirectional context for each token.</li>

            <li><strong>Scalability:</strong> Efficiently process large-scale datasets using Spark NLP.</li>

        </ul>

    </div>

    """, unsafe_allow_html=True)

    # Implementation Section
    st.markdown('<div class="sub-title">How to Use XLNet for Token Classification in Spark NLP</div>', unsafe_allow_html=True)
    st.markdown("""

    <div class="section">

        <p>Below is an example of how to set up a pipeline in Spark NLP using the XLNet model for token classification, specifically for Named Entity Recognition (NER).</p>

    </div>

    """, unsafe_allow_html=True)

    st.code('''

    from sparknlp.base import *

    from sparknlp.annotator import *

    from pyspark.ml import Pipeline



    document_assembler = DocumentAssembler() \\

        .setInputCol('text') \\

        .setOutputCol('document')



    tokenizer = Tokenizer() \\

        .setInputCols(['document']) \\

        .setOutputCol('token')



    tokenClassifier = XlnetForTokenClassification \\

        .pretrained('xlnet_base_token_classifier_conll03', 'en') \\

        .setInputCols(['token', 'document']) \\

        .setOutputCol('ner') \\

        .setCaseSensitive(True) \\

        .setMaxSentenceLength(512)



    ner_converter = NerConverter() \\

        .setInputCols(['document', 'token', 'ner']) \\

        .setOutputCol('entities')



    pipeline = Pipeline(stages=[

        document_assembler, 

        tokenizer,

        tokenClassifier,

        ner_converter

    ])



    example = spark.createDataFrame([['My name is John!']]).toDF("text")

    result = pipeline.fit(example).transform(example)

    ''', language='python')

    # Example Output
    st.text("""

    +---------+---------+

    |entities |label    |

    +---------+---------+

    |John     |PER      |

    +---------+---------+

    """)

    # Model Info Section
    st.markdown('<div class="sub-title">Choosing the Right XLNet Model</div>', unsafe_allow_html=True)
    st.markdown("""

    <div class="section">

        <p>Spark NLP offers various XLNet models tailored for token classification tasks. Selecting the appropriate model can significantly impact performance.</p>

        <p>Explore the available models on the <a class="link" href="https://sparknlp.org/models?annotator=XlnetForTokenClassification" target="_blank">Spark NLP Models Hub</a> to find the one that fits your needs.</p>

    </div>

    """, unsafe_allow_html=True)

# Tab 2: XlnetForSequenceClassification
with tab2:
    st.markdown("""

    <div class="section">

        <h2>XLNet for Sequence Classification</h2>

        <p><strong>Sequence Classification</strong> is the task of assigning a label to an entire sequence of text, such as determining the sentiment of a review or categorizing a document into topics. XLNet's ability to model long-range dependencies makes it particularly effective for sequence classification.</p>

        <p>Using XLNet for sequence classification enables:</p>

        <ul>

            <li><strong>Sentiment Analysis:</strong> Accurately determine the sentiment of text.</li>

            <li><strong>Document Classification:</strong> Categorize documents based on their content.</li>

            <li><strong>Robust Performance:</strong> Benefit from XLNet's permutation-based training for improved classification accuracy.</li>

        </ul>

    </div>

    """, unsafe_allow_html=True)

    # Implementation Section
    st.markdown('<div class="sub-title">How to Use XLNet for Sequence Classification in Spark NLP</div>', unsafe_allow_html=True)
    st.markdown("""

    <div class="section">

        <p>The following example demonstrates how to set up a pipeline in Spark NLP using the XLNet model for sequence classification, particularly for sentiment analysis of movie reviews.</p>

    </div>

    """, unsafe_allow_html=True)

    st.code('''

    from sparknlp.base import *

    from sparknlp.annotator import *

    from pyspark.ml import Pipeline



    document_assembler = DocumentAssembler() \\

        .setInputCol('text') \\

        .setOutputCol('document')



    tokenizer = Tokenizer() \\

        .setInputCols(['document']) \\

        .setOutputCol('token')



    sequenceClassifier = XlnetForSequenceClassification \\

        .pretrained('xlnet_base_sequence_classifier_imdb', 'en') \\

        .setInputCols(['token', 'document']) \\

        .setOutputCol('class') \\

        .setCaseSensitive(False) \\

        .setMaxSentenceLength(512)



    pipeline = Pipeline(stages=[

        document_assembler,

        tokenizer,

        sequenceClassifier

    ])



    example = spark.createDataFrame([['I really liked that movie!']]).toDF("text")

    result = pipeline.fit(example).transform(example)

    ''', language='python')

    # Example Output
    st.text("""

    +------------------------+

    |class                   |

    +------------------------+

    |[positive]              |

    +------------------------+

    """)

    # Model Info Section
    st.markdown('<div class="sub-title">Choosing the Right XLNet Model</div>', unsafe_allow_html=True)
    st.markdown("""

    <div class="section">

        <p>Various XLNet models are available for sequence classification in Spark NLP. Each model is fine-tuned for specific tasks, so selecting the right one is crucial for achieving optimal performance.</p>

        <p>Explore the available models on the <a class="link" href="https://sparknlp.org/models?annotator=XlnetForSequenceClassification" target="_blank">Spark NLP Models Hub</a> to find the best fit for your use case.</p>

    </div>

    """, unsafe_allow_html=True)


# Footer
st.markdown('<div class="sub-title">Community & Support</div>', unsafe_allow_html=True)

st.markdown("""

<div class="section">

    <ul>

        <li><a class="link" href="https://sparknlp.org/" target="_blank">Official Website</a>: Documentation and examples</li>

        <li><a class="link" href="https://join.slack.com/t/spark-nlp/shared_invite/zt-198dipu77-L3UWNe_AJ8xqDk0ivmih5Q" target="_blank">Slack</a>: Live discussion with the community and team</li>

        <li><a class="link" href="https://github.com/JohnSnowLabs/spark-nlp" target="_blank">GitHub</a>: Bug reports, feature requests, and contributions</li>

        <li><a class="link" href="https://medium.com/spark-nlp" target="_blank">Medium</a>: Spark NLP articles</li>

        <li><a class="link" href="https://www.youtube.com/channel/UCmFOjlpYEhxf_wJUDuz6xxQ/videos" target="_blank">YouTube</a>: Video tutorials</li>

    </ul>

</div>

""", unsafe_allow_html=True)

st.markdown('<div class="sub-title">Quick Links</div>', unsafe_allow_html=True)

st.markdown("""

<div class="section">

    <ul>

        <li><a class="link" href="https://sparknlp.org/docs/en/quickstart" target="_blank">Getting Started</a></li>

        <li><a class="link" href="https://nlp.johnsnowlabs.com/models" target="_blank">Pretrained Models</a></li>

        <li><a class="link" href="https://github.com/JohnSnowLabs/spark-nlp/tree/master/examples/python/annotation/text/english" target="_blank">Example Notebooks</a></li>

        <li><a class="link" href="https://sparknlp.org/docs/en/install" target="_blank">Installation Guide</a></li>

    </ul>

</div>

""", unsafe_allow_html=True)