Spaces:

spark-nlp
/

XLNet

Sleeping

App Files Files Community

XLNet / pages /Workflow & Model Overview.py

abdullahmubeen10

Upload 5 files

876127f verified 10 months ago

raw

history blame contribute delete

10.5 kB

	import streamlit as st

	# Page configuration
	st.set_page_config(
	layout="wide",
	initial_sidebar_state="auto"
	)

	# Custom CSS for better styling
	st.markdown("""
	<style>
	.main-title {
	font-size: 36px;
	color: #4A90E2;
	font-weight: bold;
	text-align: center;
	}
	.sub-title {
	font-size: 24px;
	color: #4A90E2;
	margin-top: 20px;
	}
	.section {
	background-color: #f9f9f9;
	padding: 15px;
	border-radius: 10px;
	margin-top: 20px;
	}
	.section h2 {
	font-size: 22px;
	color: #4A90E2;
	}
	.section p, .section ul {
	color: #666666;
	}
	.link {
	color: #4A90E2;
	text-decoration: none;
	}
	.benchmark-table {
	width: 100%;
	border-collapse: collapse;
	margin-top: 20px;
	}
	.benchmark-table th, .benchmark-table td {
	border: 1px solid #ddd;
	padding: 8px;
	text-align: left;
	}
	.benchmark-table th {
	background-color: #4A90E2;
	color: white;
	}
	.benchmark-table td {
	background-color: #f2f2f2;
	}
	</style>
	""", unsafe_allow_html=True)

	# Title
	st.markdown('<div class="main-title">Introduction to XLNet for Token & Sequence Classification in Spark NLP</div>', unsafe_allow_html=True)

	# Subtitle
	st.markdown("""
	<div class="section">
	<p>XLNet is a powerful transformer-based language model that excels in handling various Natural Language Processing (NLP) tasks. It uses a permutation-based training approach, which allows it to capture bidirectional context, making it highly effective for tasks like token classification and sequence classification.</p>
	</div>
	""", unsafe_allow_html=True)

	# Tabs for XLNet Annotators
	tab1, tab2 = st.tabs(["XlnetForTokenClassification", "XlnetForSequenceClassification"])

	# Tab 1: XlnetForTokenClassification
	with tab1:
	st.markdown("""
	<div class="section">
	<h2>XLNet for Token Classification</h2>
	<p><strong>Token Classification</strong> involves assigning labels to individual tokens (words or subwords) within a sentence. This is crucial for tasks such as Named Entity Recognition (NER), where each token is classified as a specific entity like a person, organization, or location.</p>
	<p>XLNet, with its robust contextual understanding, is particularly suited for token classification tasks. Its permutation-based training enables the model to capture dependencies across different parts of a sentence, improving accuracy in token-level predictions.</p>
	<p>Using XLNet for token classification enables:</p>
	<ul>
	<li><strong>Accurate NER:</strong> Extract entities from text with high precision.</li>
	<li><strong>Contextual Understanding:</strong> Benefit from XLNet's ability to model bidirectional context for each token.</li>
	<li><strong>Scalability:</strong> Efficiently process large-scale datasets using Spark NLP.</li>
	</ul>
	</div>
	""", unsafe_allow_html=True)

	# Implementation Section
	st.markdown('<div class="sub-title">How to Use XLNet for Token Classification in Spark NLP</div>', unsafe_allow_html=True)
	st.markdown("""
	<div class="section">
	<p>Below is an example of how to set up a pipeline in Spark NLP using the XLNet model for token classification, specifically for Named Entity Recognition (NER).</p>
	</div>
	""", unsafe_allow_html=True)

	st.code('''
	from sparknlp.base import *
	from sparknlp.annotator import *
	from pyspark.ml import Pipeline

	document_assembler = DocumentAssembler() \\
	.setInputCol('text') \\
	.setOutputCol('document')

	tokenizer = Tokenizer() \\
	.setInputCols(['document']) \\
	.setOutputCol('token')

	tokenClassifier = XlnetForTokenClassification \\
	.pretrained('xlnet_base_token_classifier_conll03', 'en') \\
	.setInputCols(['token', 'document']) \\
	.setOutputCol('ner') \\
	.setCaseSensitive(True) \\
	.setMaxSentenceLength(512)

	ner_converter = NerConverter() \\
	.setInputCols(['document', 'token', 'ner']) \\
	.setOutputCol('entities')

	pipeline = Pipeline(stages=[
	document_assembler,
	tokenizer,
	tokenClassifier,
	ner_converter
	])

	example = spark.createDataFrame([['My name is John!']]).toDF("text")
	result = pipeline.fit(example).transform(example)
	''', language='python')

	# Example Output
	st.text("""
	+---------+---------+
	\|entities \|label \|
	+---------+---------+
	\|John \|PER \|
	+---------+---------+
	""")

	# Model Info Section
	st.markdown('<div class="sub-title">Choosing the Right XLNet Model</div>', unsafe_allow_html=True)
	st.markdown("""
	<div class="section">
	<p>Spark NLP offers various XLNet models tailored for token classification tasks. Selecting the appropriate model can significantly impact performance.</p>
	<p>Explore the available models on the <a class="link" href="https://sparknlp.org/models?annotator=XlnetForTokenClassification" target="_blank">Spark NLP Models Hub</a> to find the one that fits your needs.</p>
	</div>
	""", unsafe_allow_html=True)

	# Tab 2: XlnetForSequenceClassification
	with tab2:
	st.markdown("""
	<div class="section">
	<h2>XLNet for Sequence Classification</h2>
	<p><strong>Sequence Classification</strong> is the task of assigning a label to an entire sequence of text, such as determining the sentiment of a review or categorizing a document into topics. XLNet's ability to model long-range dependencies makes it particularly effective for sequence classification.</p>
	<p>Using XLNet for sequence classification enables:</p>
	<ul>
	<li><strong>Sentiment Analysis:</strong> Accurately determine the sentiment of text.</li>
	<li><strong>Document Classification:</strong> Categorize documents based on their content.</li>
	<li><strong>Robust Performance:</strong> Benefit from XLNet's permutation-based training for improved classification accuracy.</li>
	</ul>
	</div>
	""", unsafe_allow_html=True)

	# Implementation Section
	st.markdown('<div class="sub-title">How to Use XLNet for Sequence Classification in Spark NLP</div>', unsafe_allow_html=True)
	st.markdown("""
	<div class="section">
	<p>The following example demonstrates how to set up a pipeline in Spark NLP using the XLNet model for sequence classification, particularly for sentiment analysis of movie reviews.</p>
	</div>
	""", unsafe_allow_html=True)

	st.code('''
	from sparknlp.base import *
	from sparknlp.annotator import *
	from pyspark.ml import Pipeline

	document_assembler = DocumentAssembler() \\
	.setInputCol('text') \\
	.setOutputCol('document')

	tokenizer = Tokenizer() \\
	.setInputCols(['document']) \\
	.setOutputCol('token')

	sequenceClassifier = XlnetForSequenceClassification \\
	.pretrained('xlnet_base_sequence_classifier_imdb', 'en') \\
	.setInputCols(['token', 'document']) \\
	.setOutputCol('class') \\
	.setCaseSensitive(False) \\
	.setMaxSentenceLength(512)

	pipeline = Pipeline(stages=[
	document_assembler,
	tokenizer,
	sequenceClassifier
	])

	example = spark.createDataFrame([['I really liked that movie!']]).toDF("text")
	result = pipeline.fit(example).transform(example)
	''', language='python')

	# Example Output
	st.text("""
	+------------------------+
	\|class \|
	+------------------------+
	\|[positive] \|
	+------------------------+
	""")

	# Model Info Section
	st.markdown('<div class="sub-title">Choosing the Right XLNet Model</div>', unsafe_allow_html=True)
	st.markdown("""
	<div class="section">
	<p>Various XLNet models are available for sequence classification in Spark NLP. Each model is fine-tuned for specific tasks, so selecting the right one is crucial for achieving optimal performance.</p>
	<p>Explore the available models on the <a class="link" href="https://sparknlp.org/models?annotator=XlnetForSequenceClassification" target="_blank">Spark NLP Models Hub</a> to find the best fit for your use case.</p>
	</div>
	""", unsafe_allow_html=True)


	# Footer
	st.markdown('<div class="sub-title">Community & Support</div>', unsafe_allow_html=True)

	st.markdown("""
	<div class="section">
	<ul>
	<li><a class="link" href="https://sparknlp.org/" target="_blank">Official Website</a>: Documentation and examples</li>
	<li><a class="link" href="https://join.slack.com/t/spark-nlp/shared_invite/zt-198dipu77-L3UWNe_AJ8xqDk0ivmih5Q" target="_blank">Slack</a>: Live discussion with the community and team</li>
	<li><a class="link" href="https://github.com/JohnSnowLabs/spark-nlp" target="_blank">GitHub</a>: Bug reports, feature requests, and contributions</li>
	<li><a class="link" href="https://medium.com/spark-nlp" target="_blank">Medium</a>: Spark NLP articles</li>
	<li><a class="link" href="https://www.youtube.com/channel/UCmFOjlpYEhxf_wJUDuz6xxQ/videos" target="_blank">YouTube</a>: Video tutorials</li>
	</ul>
	</div>
	""", unsafe_allow_html=True)

	st.markdown('<div class="sub-title">Quick Links</div>', unsafe_allow_html=True)

	st.markdown("""
	<div class="section">
	<ul>
	<li><a class="link" href="https://sparknlp.org/docs/en/quickstart" target="_blank">Getting Started</a></li>
	<li><a class="link" href="https://nlp.johnsnowlabs.com/models" target="_blank">Pretrained Models</a></li>
	<li><a class="link" href="https://github.com/JohnSnowLabs/spark-nlp/tree/master/examples/python/annotation/text/english" target="_blank">Example Notebooks</a></li>
	<li><a class="link" href="https://sparknlp.org/docs/en/install" target="_blank">Installation Guide</a></li>
	</ul>
	</div>
	""", unsafe_allow_html=True)