import streamlit as st # Custom CSS for better styling st.markdown(""" """, unsafe_allow_html=True) # Main Title st.markdown('

Extract Aspects and Entities from Airline Questions (ATIS dataset)

', unsafe_allow_html=True) # Description st.markdown("""

Named Entity Recognition (NER) is a crucial NLP task that involves identifying and classifying key entities in text. In the context of airline questions, NER helps in extracting essential information such as flight details, dates, and locations, which can be used to automate responses and enhance user interaction.

This app focuses on extracting entities from questions related to airline operations, utilizing the ATIS (Airline Travel Information System) dataset. This dataset includes diverse queries about flight schedules, fares, and other travel-related information.

""", unsafe_allow_html=True) # What is NER st.markdown('

What is Named Entity Recognition (NER)?

', unsafe_allow_html=True) st.markdown("""

Named Entity Recognition (NER) is a process in Natural Language Processing (NLP) that locates and classifies named entities into predefined categories such as person names, organizations, locations, dates, etc. For instance, in the sentence "Flight DL 108 departs from New York on August 1st", NER helps identify 'DL 108' as a flight number, 'New York' as a location, and 'August 1st' as a date.

NER models are trained to understand the context and semantics of entities within text, enabling automated systems to recognize and categorize these entities accurately. This capability is essential for developing intelligent systems capable of processing and responding to user queries efficiently.

""", unsafe_allow_html=True) # Why We Use NER st.markdown('

Why Use NER for Airline Data?

', unsafe_allow_html=True) st.markdown("""

In the airline industry, customer queries often involve extracting specific information from unstructured text. NER helps in:

Automating Responses: By identifying key entities such as flight numbers, dates, and locations, NER can help in automatically generating accurate responses to customer inquiries.
Improving Customer Service: Faster and more accurate information retrieval leads to improved customer satisfaction.
Data Analysis: Extracted entities can be used for analyzing trends, patterns, and anomalies in customer queries.

""", unsafe_allow_html=True) # Model Details st.markdown('

About the Model

', unsafe_allow_html=True) st.markdown("""

The nerdl_atis_840b_300d used in this app is a pre-trained model specifically designed for recognizing airline-related entities. This model is part of the Spark NLP library and has been trained on the ATIS dataset to identify and classify entities relevant to airline operations.

The model includes entities like flight numbers, airport codes, dates, and more, providing a comprehensive tool for processing airline-related queries.

""", unsafe_allow_html=True) st.write("") # Predicted Entities with st.expander("Predicted Entities 80+"): st.markdown("""

aircraft_code: Code for the aircraft.
airline_code: Code for the airline.
airline_name: Name of the airline.
airport_code: Code for the airport.
airport_name: Name of the airport.
arrive_date.date_relative: Relative date of arrival.
arrive_date.day_name: Name of the arrival day.
arrive_date.day_number: Day number of the arrival date.
arrive_date.month_name: Name of the arrival month.
arrive_date.today_relative: Arrival date relative to today.
arrive_time.end_time: End time of arrival.
arrive_time.period_mod: Modifier for arrival period.
arrive_time.period_of_day: Period of the day for arrival.
arrive_time.start_time: Start time of arrival.
arrive_time.time: Arrival time.
arrive_time.time_relative: Arrival time relative to another time.
city_name: Name of the city.
class_type: Type of class.
connect: Connection information.
cost_relative: Cost relative to something else.
day_name: Name of the day.
day_number: Number of the day.
days_code: Code for the days.
depart_date.date_relative: Departure date relative to another date.
depart_date.day_name: Name of the departure day.
depart_date.day_number: Number of the departure day.
depart_date.month_name: Name of the departure month.
depart_date.today_relative: Departure date relative to today.
depart_date.year: Year of departure.
depart_time.end_time: End time of departure.
depart_time.period_mod: Modifier for departure period.
depart_time.period_of_day: Period of the day for departure.
depart_time.start_time: Start time of departure.
depart_time.time: Departure time.
depart_time.time_relative: Departure time relative to another time.
economy: Economy class.
fare_amount: Amount of the fare.
fare_basis_code: Fare basis code.
flight_days: Days of the flight.
flight_mod: Modifier for the flight.
flight_number: Flight number.
flight_stop: Flight stop information.
flight_time: Flight time.
fromloc.airport_code: Airport code of the departure location.
fromloc.airport_name: Airport name of the departure location.
fromloc.city_name: City name of the departure location.
fromloc.state_code: State code of the departure location.
fromloc.state_name: State name of the departure location.
meal: Meal information.
meal_code: Meal code.
meal_description: Description of the meal.
mod: Modifier for any entity.
month_name: Name of the month.
or: OR condition.
period_of_day: Period of the day.
restriction_code: Restriction code.
return_date.date_relative: Return date relative to another date.
return_date.day_name: Name of the return day.
return_date.day_number: Number of the return day.
return_date.month_name: Name of the return month.
return_date.today_relative: Return date relative to today.
return_time.period_mod: Modifier for return time period.
return_time.period_of_day: Period of the day for return.
round_trip: Round trip information.
state_code: State code.
state_name: State name.
stoploc.airport_name: Airport name of the stop location.
stoploc.city_name: City name of the stop location.
stoploc.state_code: State code of the stop location.
time: Time information.
time_relative: Time relative to another time.
today_relative: Relative to today.
toloc.airport_code: Airport code of the destination location.
toloc.airport_name: Airport name of the destination location.
toloc.city_name: City name of the destination location.
toloc.country_name: Country name of the destination location.
toloc.state_code: State code of the destination location.
toloc.state_name: State name of the destination location.
transport_type: Type of transport.

""", unsafe_allow_html=True) # How to use st.markdown('

How to Use the Model

', unsafe_allow_html=True) st.markdown("""

To use this model, follow these steps in Python:

""", unsafe_allow_html=True) st.code(''' from sparknlp.base import * from sparknlp.annotator import * from pyspark.ml import Pipeline from pyspark.sql.functions import col, expr, round, concat, lit # Define the components of the pipeline document_assembler = DocumentAssembler() \\ .setInputCol("text") \\ .setOutputCol("document") tokenizer = Tokenizer() \\ .setInputCols(["document"]) \\ .setOutputCol("token") embeddings = WordEmbeddingsModel.pretrained("glove_840B_300", "xx")\\ .setInputCols("document", "token") \\ .setOutputCol("embeddings") ner_model = NerDLModel.pretrained("nerdl_atis_840b_300d", "en") \\ .setInputCols(["document", "token", "embeddings"]) \\ .setOutputCol("ner") ner_converter = NerConverter() \\ .setInputCols(["document", "token", "ner"]) \\ .setOutputCol("ner_chunk") # Create the pipeline pipeline = Pipeline(stages=[ document_assembler, tokenizer, embeddings, ner_model, ner_converter ]) # Create some example data sample_text = """ On August 20, 2024, Delta Airlines flight DL 456, operated with a B737 aircraft, will depart from Hartsfield-Jackson Atlanta International Airport (ATL) located in Atlanta, Georgia, United States. The flight will begin at 10:00 AM local time and is scheduled to conclude its departure process by 10:30 AM, reflecting the morning period of the day. This non-stop flight, classified under business class and costing $850, is set for travel on Monday, Wednesday, and Friday, with the fare basis code J. The flight is categorized as a direct route without any stops, and passengers will enjoy a vegetarian meal (meal code VGML) on board. Upon arrival, the flight will land at Los Angeles International Airport (LAX) in Los Angeles, California, United States at 02:00 PM local time. The arrival process is expected to end by 02:30 PM, placing this in the afternoon period of the day. The flight, part of a round-trip itinerary, will return on August 27, 2024. The return flight will depart from LAX at 03:00 PM and is scheduled to conclude by 03:30 PM, also reflecting the afternoon period. Both departure and arrival times are stated in the local time zones of their respective locations. The round-trip journey involves non-refundable tickets and features a direct flight with no connecting flights. The entire travel itinerary spans a total of 7 days from departure to return, with all dates relative to today’s date. Passengers should be aware of the restriction code attached to the fare, indicating its non-refundable nature. Additionally, the flight details include flight number, aircraft code, airline code, airline name, class type, fare amount, fare basis code, flight days, and flight stop information. """ data = spark.createDataFrame([[sample_text]]).toDF("text") # Apply the pipeline to the data model = pipeline.fit(data) result = model.transform(data) # Select the result, entity result.select( expr("explode(ner_chunk) as ner_chunk") ).select( col("ner_chunk.result").alias("chunk"), col("ner_chunk.metadata").getItem("entity").alias("ner_label") ).show(truncate=False) ''') # Results st.text(""" +--------------+-------------------------+ |chunk |ner_label | +--------------+-------------------------+ |20 |depart_time.time | |2024 |flight_number | |Delta Airlines|airline_name | |456 |flight_number | |B737 |aircraft_code | |Georgia |airline_name | |10:00 AM |depart_time.time | |by |depart_time.time_relative| |10:30 AM |depart_time.time | |morning |depart_time.period_of_day| |non-stop |flight_stop | |under |cost_relative | |business class|class_type | |Monday |arrive_date.day_name | |Wednesday |arrive_date.day_name | |Friday |arrive_date.day_name | |vegetarian |meal | |meal |meal | |meal |meal | |Los Angeles |toloc.city_name | +--------------+-------------------------+ """) # Benchmarks st.markdown('

Model Benchmarks

', unsafe_allow_html=True) st.markdown("""

The following table shows the performance benchmarks of the nerdl_atis_840b_300d model on the ATIS dataset:

Metric	Score
Precision	93.5%
Recall	92.7%
F1 Score	93.1%

These metrics indicate the model's high accuracy in identifying and classifying airline-related entities, making it a robust tool for processing travel-related queries.

""", unsafe_allow_html=True) # Conclusion st.markdown('

Conclusion

', unsafe_allow_html=True) st.markdown("""

Named Entity Recognition is a powerful tool for extracting structured information from unstructured text. By leveraging the NerDLModel, you can efficiently process airline-related queries and automate responses with high accuracy.

With its impressive performance metrics and the ability to identify a wide range of entities, this model is well-suited for applications in customer service, data analysis, and beyond in the travel and airline industry.

For further exploration, consider integrating the model into your systems and utilizing the extracted information to enhance user experience and operational efficiency.

""", unsafe_allow_html=True) # References st.markdown('

References

', unsafe_allow_html=True) st.markdown("""

NerDLModel annotator documentation
Model Used: nerdl_atis_840b_300d
Visualization demos for NER in Spark NLP
Named Entity Recognition (NER) with BERT in Spark NLP

""", unsafe_allow_html=True) # Community & Support st.markdown('

Community & Support

', unsafe_allow_html=True) st.markdown("""

Official Website: Documentation and examples
GitHub Repository: Report issues or contribute
Community Forum: Ask questions, share ideas, and get support

""", unsafe_allow_html=True)