import streamlit as st # Custom CSS for better styling st.markdown(""" """, unsafe_allow_html=True) # Introduction st.markdown('

Part-of-Speech Tagging with Spark NLP

', unsafe_allow_html=True) st.markdown("""

Welcome to the Spark NLP Part-of-Speech Tagging Demo App! Part-of-Speech (POS) tagging is an essential task in Natural Language Processing (NLP) that involves identifying the grammatical roles of words within a text, such as nouns, verbs, adjectives, and more. This app demonstrates how to use the PerceptronModel annotator to perform POS tagging in text data using Spark NLP.

""", unsafe_allow_html=True) st.image('images/POS-Tagging.png', caption="Dependency parsing with POS tags", use_column_width='auto') # About POS Tagging st.markdown('

About Part-of-Speech Tagging

', unsafe_allow_html=True) st.markdown("""

Part-of-Speech (POS) tagging involves assigning each word in a sentence its grammatical role, such as subject, verb, or adjective. This technique helps improve many NLP tasks, including Named Entity Recognition (NER), Word Sense Disambiguation (WSD), Question Answering (QA), and Dependency Parsing (DP).

For instance, knowing that a word is an adjective increases the likelihood that one of the neighboring words is a noun. The context can also alter the meaning of words significantly:

What is your address? (noun)
I will address this issue today. (verb)

POS tags can be categorized using schemas like the Universal Dependencies schema or the Penn Treebank POS tags. Each schema provides a set of tags for different grammatical roles.

""", unsafe_allow_html=True) # Using PerceptronModel for POS Tagging st.markdown('

Using PerceptronModel for POS Tagging in Spark NLP

', unsafe_allow_html=True) st.markdown("""

The PerceptronModel annotator in Spark NLP allows users to perform POS tagging with high accuracy using pretrained models. This annotator can identify and label the grammatical roles of words in text data, providing valuable insights for various applications.

The PerceptronModel annotator in Spark NLP offers:

Accurate POS tagging using pretrained models
Identification and labeling of grammatical roles
Efficient processing of large text datasets
Integration with other Spark NLP components for comprehensive NLP pipelines

""", unsafe_allow_html=True) st.markdown('

Example Usage in Python

', unsafe_allow_html=True) st.markdown('

Here’s how you can implement POS tagging using the PerceptronModel annotator in Spark NLP:

', unsafe_allow_html=True) # Setup Instructions st.markdown('

Setup

', unsafe_allow_html=True) st.markdown('

To install Spark NLP in Python, use your favorite package manager (conda, pip, etc.). For example:

', unsafe_allow_html=True) st.code(""" pip install spark-nlp pip install pyspark """, language="bash") st.markdown("

Then, import Spark NLP and start a Spark session:

", unsafe_allow_html=True) st.code(""" import sparknlp # Start Spark Session spark = sparknlp.start() """, language='python') # POS Tagging Example st.markdown('

Example Usage: POS Tagging with PerceptronModel

', unsafe_allow_html=True) st.code(''' from sparknlp.base import DocumentAssembler from sparknlp.annotator import Tokenizer, PerceptronModel from pyspark.ml import Pipeline import pyspark.sql.functions as F # Stage 1: Transforms raw texts to document annotation document_assembler = DocumentAssembler() \\ .setInputCol("text") \\ .setOutputCol("document") # Stage 2: Tokenization tokenizer = Tokenizer() \\ .setInputCols(["document"]) \\ .setOutputCol("token") # Stage 3: Perceptron model for POS Tagger # Pretrained model pos_anc for texts in English postagger = PerceptronModel.pretrained("pos_anc", "en") \\ .setInputCols(["document", "token"]) \\ .setOutputCol("pos") # Define the pipeline pipeline = Pipeline(stages=[document_assembler, tokenizer, postagger]) # Create the dataframe data = spark.createDataFrame([["Unions representing workers at Turner Newall say they are 'disappointed' after talks with stricken parent firm Federal Mogul"]]).toDF("text") # Fit the dataframe to the pipeline to get the model model = pipeline.fit(data) # Transform the data to get predictions result = model.transform(data) # Display the POS tags result.select( F.explode( F.arrays_zip(result.token.result, result.token.begin, result.token.end, result.pos.result) ).alias("cols") ).select( F.expr("cols['0']").alias("token"), F.expr("cols['1']").alias("begin"), F.expr("cols['2']").alias("end"), F.expr("cols['3']").alias("pos"), ).show(truncate=False) ''', language='python') st.text(""" +------------+-----+---+---+ |token |begin|end|pos| +------------+-----+---+---+ |Unions |0 |5 |NNP| |representing|7 |18 |VBG| |workers |20 |26 |NNS| |at |28 |29 |IN | |Turner |31 |36 |NNP| |Newall |38 |43 |NNP| |say |45 |47 |VBP| |they |49 |52 |PRP| |are |54 |56 |VBP| |' |58 |58 |POS| |disappointed|59 |70 |JJ | |' |71 |71 |POS| |after |73 |77 |IN | |talks |79 |83 |NNS| |with |85 |88 |IN | |stricken |90 |97 |NN | |parent |99 |104|NN | |firm |106 |109|NN | |Federal |111 |117|NNP| |Mogul |119 |123|NNP| +------------+-----+---+---+ """) st.markdown("""

The code snippet demonstrates how to set up a pipeline in Spark NLP to perform POS tagging on text data using the PerceptronModel annotator. The resulting DataFrame contains the tokens and their corresponding POS tags.

""", unsafe_allow_html=True) # One-liner Alternative st.markdown('

One-liner Alternative

', unsafe_allow_html=True) st.markdown("""

In October 2022, John Snow Labs released the open-source johnsnowlabs library that contains all the company products, open-source and licensed, under one common library. This simplified the workflow, especially for users working with more than one of the libraries (e.g., Spark NLP + Healthcare NLP). This new library is a wrapper on all of John Snow Lab’s libraries and can be installed with pip:

pip install johnsnowlabs

To run POS tagging with one line of code, we can simply:

""", unsafe_allow_html=True) st.code(""" # Import the NLP module which contains Spark NLP and NLU libraries from johnsnowlabs import nlp example_sentence = "Unions representing workers at Turner Newall say they are 'disappointed' after talks with stricken parent firm Federal Mogul" # Returns a pandas DataFrame, we select the desired columns nlp.load("pos").predict(example_sentence)[['token','pos']] """, language='python') st.image('images/johnsnowlabs-output.png', use_column_width='auto') # Summary st.markdown('

Summary

', unsafe_allow_html=True) st.markdown("""

In this demo app, we showcased how to perform Part-of-Speech tagging using the PerceptronModel annotator in Spark NLP. POS tagging is a crucial step in many NLP applications, helping to understand the grammatical structure and context of the text. With Spark NLP, you can efficiently process and analyze large volumes of text data, leveraging powerful pretrained models for accurate and reliable results.

We hope you found this demo helpful and encourage you to explore more features and capabilities of Spark NLP for your NLP projects!

""", unsafe_allow_html=True) # References and Additional Information st.markdown('

For additional information, please check the following references.

', unsafe_allow_html=True) st.markdown("""

Spark NLP documentation page for all available annotators
Python API documentation for PerceptronModel and Dependency Parser
Scala API documentation for PerceptronModel and DependencyParserModel
For extended examples of usage of Spark NLP annotators, check the Spark NLP Workshop repository.
Minsky, M.L. and Papert, S.A. (1969) Perceptrons. MIT Press, Cambridge.

""", unsafe_allow_html=True) st.markdown('

Community & Support

', unsafe_allow_html=True) st.markdown("""

Official Website: Documentation and examples
Slack: Live discussion with the community and team
GitHub: Bug reports, feature requests, and contributions
Medium: Spark NLP articles
YouTube: Video tutorials

""", unsafe_allow_html=True)