File size: 11,445 Bytes
5f584f1
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
import streamlit as st

# Custom CSS for better styling
st.markdown("""
    <style>
        .main-title {
            font-size: 36px;
            color: #4A90E2;
            font-weight: bold;
            text-align: center;
        }
        .sub-title {
            font-size: 24px;
            color: #4A90E2;
            margin-top: 20px;
        }
        .section {
            background-color: #f9f9f9;
            padding: 15px;
            border-radius: 10px;
            margin-top: 20px;
        }
        .section h2 {
            font-size: 22px;
            color: #4A90E2;
        }
        .section p, .section ul {
            color: #666666;
        }
        .link {
            color: #4A90E2;
            text-decoration: none;
        }
    </style>
""", unsafe_allow_html=True)

# Title
st.markdown('<div class="main-title">Grammar Analysis & Dependency Parsing</div>', unsafe_allow_html=True)

# Introduction Section
st.markdown("""
<div class="section">
    <p>Understanding the grammatical structure of sentences is crucial in Natural Language Processing (NLP) for various applications such as translation, text summarization, and information extraction. This page focuses on Grammar Analysis and Dependency Parsing, which help in identifying the grammatical roles of words in a sentence and the relationships between them.</p>
    <p>We utilize Spark NLP, a robust library for NLP tasks, to perform Part-of-Speech (POS) tagging and Dependency Parsing, enabling us to analyze sentences at scale with high accuracy.</p>
</div>
""", unsafe_allow_html=True)

# Understanding Dependency Parsing
st.markdown('<div class="sub-title">Understanding Dependency Parsing</div>', unsafe_allow_html=True)

st.markdown("""
<div class="section">
    <p>Dependency Parsing is a technique used to understand the grammatical structure of a sentence by identifying the dependencies between words. It maps out relationships such as subject-verb, adjective-noun, etc., which are essential for understanding the sentence's meaning.</p>
    <p>In Dependency Parsing, each word in a sentence is linked to another word, creating a tree-like structure called a dependency tree. This structure helps in various NLP tasks, including information retrieval, question answering, and machine translation.</p>
</div>
""", unsafe_allow_html=True)

# Implementation Section
st.markdown('<div class="sub-title">Implementing Grammar Analysis & Dependency Parsing</div>', unsafe_allow_html=True)

st.markdown("""
<div class="section">
    <p>The following example demonstrates how to implement a grammar analysis pipeline using Spark NLP. The pipeline includes stages for tokenization, POS tagging, and dependency parsing, extracting the grammatical relationships between words in a sentence.</p>
</div>
""", unsafe_allow_html=True)

st.code('''
import sparknlp
from sparknlp.base import *
from sparknlp.annotator import *
from pyspark.ml import Pipeline
import pyspark.sql.functions as F

# Initialize Spark NLP
spark = sparknlp.start()

# Stage 1: Document Assembler
document_assembler = DocumentAssembler()\\
    .setInputCol("text")\\
    .setOutputCol("document")

# Stage 2: Tokenizer
tokenizer = Tokenizer().setInputCols(["document"]).setOutputCol("token")

# Stage 3: POS Tagger
postagger = PerceptronModel.pretrained("pos_anc", "en")\\
    .setInputCols(["document", "token"])\\
    .setOutputCol("pos")

# Stage 4: Dependency Parsing
dependency_parser = DependencyParserModel.pretrained("dependency_conllu")\\
    .setInputCols(["document", "pos", "token"])\\
    .setOutputCol("dependency")

# Stage 5: Typed Dependency Parsing
typed_dependency_parser = TypedDependencyParserModel.pretrained("dependency_typed_conllu")\\
    .setInputCols(["token", "pos", "dependency"])\\
    .setOutputCol("dependency_type")

# Define the pipeline
pipeline = Pipeline(stages=[
    document_assembler,
    tokenizer,
    postagger,
    dependency_parser,
    typed_dependency_parser
])

# Example sentence
example = spark.createDataFrame([
    ["Unions representing workers at Turner Newall say they are 'disappointed' after talks with stricken parent firm Federal Mogul."]
]).toDF("text")

# Apply the pipeline
result = pipeline.fit(spark.createDataFrame([[""]]).toDF("text")).transform(example)

# Display the results
result.select(
    F.explode(
        F.arrays_zip(
            result.token.result,
            result.pos.result,
            result.dependency.result,
            result.dependency_type.result
        )
    ).alias("cols")
).select(
    F.expr("cols['0']").alias("token"),
    F.expr("cols['1']").alias("pos"),
    F.expr("cols['2']").alias("dependency"),
    F.expr("cols['3']").alias("dependency_type")
).show(truncate=False)
''', language='python')

# Example Output
st.text("""
+------------+---+------------+---------------+
|token       |pos|dependency  |dependency_type|
+------------+---+------------+---------------+
|Unions      |NNP|ROOT        |root           |
|representing|VBG|workers     |amod           |
|workers     |NNS|Unions      |flat           |
|at          |IN |Turner      |case           |
|Turner      |NNP|workers     |flat           |
|Newall      |NNP|say         |nsubj          |
|say         |VBP|Unions      |parataxis      |
|they        |PRP|disappointed|nsubj          |
|are         |VBP|disappointed|nsubj          |
|'           |POS|disappointed|case           |
|disappointed|JJ |say         |nsubj          |
|'           |POS|disappointed|case           |
|after       |IN |talks       |case           |
|talks       |NNS|disappointed|nsubj          |
|with        |IN |stricken    |det            |
|stricken    |NN |talks       |amod           |
|parent      |NN |Mogul       |flat           |
|firm        |NN |Mogul       |flat           |
|Federal     |NNP|Mogul       |flat           |
|Mogul       |NNP|stricken    |flat           |
+------------+---+------------+---------------+
""")

# Visualizing the Dependencies
st.markdown('<div class="sub-title">Visualizing the Dependencies</div>', unsafe_allow_html=True)

st.markdown("""
<div class="section">
    <p>For a visual representation of the dependencies, you can use the <b>spark-nlp-display</b> module, an open-source tool that makes visualizing dependencies straightforward and easy to integrate into your workflow.</p>
    <p>First, install the module with pip:</p>
    <code>pip install spark-nlp-display</code>
    <p>Then, you can use the <code>DependencyParserVisualizer</code> class to create a visualization of the dependency tree:</p>
</div>
""", unsafe_allow_html=True)

st.code('''
from sparknlp_display import DependencyParserVisualizer

# Initialize the visualizer
dependency_vis = DependencyParserVisualizer()

# Display the dependency tree
dependency_vis.display(
    result.collect()[0],  # single example result
    pos_col="pos",
    dependency_col="dependency",
    dependency_type_col="dependency_type",
)
''', language='python')

st.image('images/DependencyParserVisualizer.png', caption='The visualization of dependencies')

st.markdown("""
<div class="section">
    <p>This code snippet will generate a visual dependency tree like shown above for the given sentence, clearly showing the grammatical relationships between words. The <code>spark-nlp-display</code> module provides an intuitive way to visualize complex dependency structures, aiding in the analysis and understanding of sentence grammar.</p>
</div>
""", unsafe_allow_html=True)

# Model Info Section
st.markdown('<div class="sub-title">Choosing the Right Model for Dependency Parsing</div>', unsafe_allow_html=True)

st.markdown("""
<div class="section">
    <p>For dependency parsing, the models <b>"dependency_conllu"</b> and <b>"dependency_typed_conllu"</b> are used. These models are trained on a large corpus and are effective for extracting grammatical relations between words in English sentences.</p>
    <p>To explore more models tailored for different NLP tasks, visit the <a class="link" href="https://sparknlp.org/models?annotator=DependencyParserModel" target="_blank">Spark NLP Models Hub</a>.</p>
</div>
""", unsafe_allow_html=True)

# References Section
st.markdown('<div class="sub-title">References</div>', unsafe_allow_html=True)

st.markdown("""
<div class="section">
    <ul>
        <li><a class="link" href="https://nlp.johnsnowlabs.com/docs/en/annotators" target="_blank" rel="noopener">Spark NLP documentation page</a> for all available annotators</li>
        <li>Python API documentation for <a class="link" href="https://nlp.johnsnowlabs.com/api/python/reference/autosummary/sparknlp/annotator/pos/perceptron/index.html#sparknlp.annotator.pos.perceptron.PerceptronModel" target="_blank" rel="noopener">PerceptronModel</a> and <a href="https://nlp.johnsnowlabs.com/api/python/reference/autosummary/sparknlp/annotator/dependency/dependency_parser/index.html#sparknlp.annotator.dependency.dependency_parser.DependencyParserModel" target="_blank" rel="noopener">Dependency Parser</a></li>
        <li>Scala API documentation for <a class="link" href="https://nlp.johnsnowlabs.com/api/com/johnsnowlabs/nlp/annotators/pos/perceptron/PerceptronModel.html" target="_blank" rel="noopener">PerceptronModel</a> and <a href="https://nlp.johnsnowlabs.com/api/com/johnsnowlabs/nlp/annotators/parser/dep/DependencyParserModel.html" target="_blank" rel="noopener">DependencyParserModel</a></li>
        <li>For extended examples of usage of Spark NLP annotators, check the <a class="link" href="https://github.com/JohnSnowLabs/spark-nlp-workshop" target="_blank" rel="noopener">Spark NLP Workshop repository</a>.</li>
        <li>Minsky, M.L. and Papert, S.A. (1969) Perceptrons. MIT Press, Cambridge.</li>
    </ul>
</div>
""", unsafe_allow_html=True)

# Community & Support Section
st.markdown('<div class="sub-title">Community & Support</div>', unsafe_allow_html=True)

st.markdown("""
<div class="section">
    <ul>
        <li><a class="link" href="https://sparknlp.org/" target="_blank">Official Website</a>: Documentation and examples</li>
        <li><a class="link" href="https://join.slack.com/t/spark-nlp/shared_invite/zt-198dipu77-L3UWNe_AJ8xqDk0ivmih5Q" target="_blank">Slack</a>: Live discussion with the community and team</li>
        <li><a class="link" href="https://github.com/JohnSnowLabs/spark-nlp" target="_blank">GitHub</a>: Bug reports, feature requests, and contributions</li>
        <li><a class="link" href="https://medium.com/spark-nlp" target="_blank">Medium</a>: Spark NLP articles</li>
        <li><a class="link" href="https://www.youtube.com/channel/UCmFOjlpYEhxf_wJUDuz6xxQ/videos" target="_blank">YouTube</a>: Video tutorials</li>
    </ul>
</div>
""", unsafe_allow_html=True)

# Quick Links Section
st.markdown('<div class="sub-title">Quick Links</div>', unsafe_allow_html=True)

st.markdown("""
<div class="section">
    <ul>
        <li><a class="link" href="https://sparknlp.org/docs/en/quickstart" target="_blank">Getting Started</a></li>
        <li><a class="link" href="https://nlp.johnsnowlabs.com/models" target="_blank">Pretrained Models</a></li>
        <li><a class="link" href="https://github.com/JohnSnowLabs/spark-nlp/tree/master/examples/python/annotation/text/english" target="_blank">Example Notebooks</a></li>
        <li><a class="link" href="https://sparknlp.org/docs/en/install" target="_blank">Installation Guide</a></li>
    </ul>
</div>
""", unsafe_allow_html=True)