Spaces:

spark-nlp
/

sparknlp-text-to-sql-t5

Sleeping

App Files Files Community

abdullahmubeen10 commited on Aug 29, 2024

Commit

5fea2c5

verified ·

1 Parent(s): b364455

Update pages/Workflow & Model Overview.py

Browse files

Files changed (1) hide show

pages/Workflow & Model Overview.py +166 -169

pages/Workflow & Model Overview.py CHANGED Viewed

@@ -1,169 +1,166 @@
-import streamlit as st
-# Page configuration
-st.set_page_config(
-    layout="wide",
-    initial_sidebar_state="auto"
-)
-# Custom CSS for better styling
-st.markdown("""
-    <style>
-        .main-title {
-            font-size: 36px;
-            color: #4A90E2;
-            font-weight: bold;
-            text-align: center;
-        }
-        .sub-title {
-            font-size: 24px;
-            color: #4A90E2;
-            margin-top: 20px;
-        }
-        .section {
-            background-color: #f9f9f9;
-            padding: 15px;
-            border-radius: 10px;
-            margin-top: 20px;
-        }
-        .section h2 {
-            font-size: 22px;
-            color: #4A90E2;
-        }
-        .section p, .section ul {
-            color: #666666;
-        }
-        .link {
-            color: #4A90E2;
-            text-decoration: none;
-        }
-    </style>
-""", unsafe_allow_html=True)
-# Title
-st.markdown('<div class="main-title">Chat and Conversational LLMs (Facebook Llama-2)</div>', unsafe_allow_html=True)
-# Introduction Section
-st.markdown("""
-<div class="section">
-    <p>Facebook's Llama-2 is a cutting-edge family of large language models (LLMs) designed to excel in a variety of conversational tasks. With models ranging from 7 billion to 70 billion parameters, Llama-2 has been fine-tuned specifically for dialogue use cases, making it one of the most powerful and versatile models available for chat and conversational AI.</p>
-    <p>Llama-2 models have demonstrated superior performance across multiple benchmarks, often outperforming other open-source models and rivaling some of the best closed-source models like ChatGPT and PaLM. These models are capable of handling complex, context-rich conversations with a high degree of accuracy and coherence.</p>
-</div>
-""", unsafe_allow_html=True)
-# Llama-2 Transformer Overview
-st.markdown('<div class="sub-title">Understanding the Llama-2 Transformer</div>', unsafe_allow_html=True)
-st.markdown("""
-<div class="section">
-    <h2>Llama-2: The Transformer Architecture</h2>
-    <p>Llama-2 is based on the transformer architecture, a deep learning model that has revolutionized the field of natural language processing. The transformer model employs a mechanism called self-attention, which allows it to weigh the importance of different words in a sentence relative to each other. This enables the model to capture long-range dependencies in text, making it highly effective for understanding and generating human-like text.</p>
-    <p>The Llama-2 model family builds on this architecture, incorporating enhancements that improve its ability to handle longer contexts and generate more accurate and coherent responses. The model is particularly well-suited for dialogue and conversational applications, where understanding context and maintaining coherence over multiple turns of conversation is crucial.</p>
-</div>
-""", unsafe_allow_html=True)
-# Performance Section
-st.markdown('<div class="sub-title">Performance and Benchmarks</div>', unsafe_allow_html=True)
-st.markdown("""
-<div class="section">
-    <p>Llama-2-Chat models have been rigorously tested against a variety of benchmarks to assess their performance in dialogue and conversational tasks. The results have shown that Llama-2 outperforms other open-source chat models on most benchmarks, demonstrating its effectiveness in generating accurate, relevant, and contextually appropriate responses.</p>
-    <p>In human evaluations, Llama-2-Chat has been found to be on par with some of the leading closed-source models in terms of helpfulness and safety. This makes it a highly reliable option for developers looking to implement conversational AI in their applications.</p>
-</div>
-""", unsafe_allow_html=True)
-# Implementation Section
-st.markdown('<div class="sub-title">Implementing Llama-2 for Conversational AI</div>', unsafe_allow_html=True)
-st.markdown("""
-<div class="section">
-    <p>The following is an example of how to implement a Llama-2 model for generating responses in a conversational AI application. We use the Llama-2 model with a simple Spark NLP pipeline to generate responses to user input.</p>
-</div>
-""", unsafe_allow_html=True)
-st.code('''
-from sparknlp.base import *
-from sparknlp.annotator import *
-from pyspark.ml import Pipeline
-from pyspark.sql.functions import col, expr
-documentAssembler = DocumentAssembler() \\
-    .setInputCol("text") \\
-    .setOutputCol("documents")
-llama2 = LLAMA2Transformer \\
-    .pretrained("llama_2_7b_chat_hf_int4") \\
-    .setMaxOutputLength(50) \\
-    .setDoSample(False) \\
-    .setInputCols(["documents"]) \\
-    .setOutputCol("generation")
-pipeline = Pipeline().setStages([documentAssembler, llama2])
-data = spark.createDataFrame([["what are your thoughts about the new monkeypox virus"]]).toDF("text")
-result = pipeline.fit(data).transform(data)
-result.select("generation.result").show(truncate=False)
-''', language='python')
-# Example Output
-st.text("""
-+------------------------------------------------+
-|generation.result                                |
-+------------------------------------------------+
-|Monkeypox is a rare disease that has been ...    |
-+------------------------------------------------+
-""")
-# Model Info Section
-st.markdown('<div class="sub-title">Choosing the Right Llama-2 Model</div>', unsafe_allow_html=True)
-st.markdown("""
-<div class="section">
-    <p>Llama-2 models are available in various sizes and configurations, depending on the specific needs of your application. For conversational AI, it is important to select a model that balances performance with resource efficiency. The model used in the example, "llama_2_7b_chat_hf_int4," is optimized for chat applications and is a good starting point for many use cases.</p>
-    <p>For more complex tasks or larger-scale deployments, you may consider using one of the larger Llama-2 models, such as the 13B or 70B parameter variants, which offer greater accuracy and contextual understanding.</p>
-    <p>Explore the available models on the <a class="link" href="https://sparknlp.org/models?annotator=LLAMA2Transformer" target="_blank">Spark NLP Models Hub</a> to find the one that fits your needs.</p>
-</div>
-""", unsafe_allow_html=True)
-# Footer
-# References Section
-st.markdown('<div class="sub-title">References</div>', unsafe_allow_html=True)
-st.markdown("""
-<div class="section">
-    <ul>
-        <li><a class="link" href="https://ai.facebook.com/" target="_blank">Facebook AI Research</a>: Learn more about Facebook's AI initiatives</li>
-        <li><a class="link" href="https://sparknlp.org/models?annotator=LLAMA2Transformer" target="_blank">Spark NLP Model Hub</a>: Explore Llama-2 models</li>
-        <li><a class="link" href="https://huggingface.co/facebook/llama" target="_blank">Hugging Face Model Hub</a>: Explore Llama-2 models</li>
-        <li><a class="link" href="https://github.com/facebookresearch/llama" target="_blank">GitHub</a>: Access the Llama-2 repository and contribute</li>
-        <li><a class="link" href="https://ai.facebook.com/blog/introducing-llama-2" target="_blank">Llama-2 Blog Post</a>: Detailed insights from the developers</li>
-    </ul>
-</div>
-""", unsafe_allow_html=True)
-st.markdown('<div class="sub-title">Community & Support</div>', unsafe_allow_html=True)
-st.markdown("""
-<div class="section">
-    <ul>
-        <li><a class="link" href="https://sparknlp.org/" target="_blank">Official Website</a>: Documentation and examples</li>
-        <li><a class="link" href="https://join.slack.com/t/spark-nlp/shared_invite/zt-198dipu77-L3UWNe_AJ8xqDk0ivmih5Q" target="_blank">Slack</a>: Live discussion with the community and team</li>
-        <li><a class="link" href="https://github.com/JohnSnowLabs/spark-nlp" target="_blank">GitHub</a>: Bug reports, feature requests, and contributions</li>
-        <li><a class="link" href="https://medium.com/spark-nlp" target="_blank">Medium</a>: Spark NLP articles</li>
-        <li><a class="link" href="https://www.youtube.com/channel/UCmFOjlpYEhxf_wJUDuz6xxQ/videos" target="_blank">YouTube</a>: Video tutorials</li>
-    </ul>
-</div>
-""", unsafe_allow_html=True)
-st.markdown('<div class="sub-title">Quick Links</div>', unsafe_allow_html=True)
-st.markdown("""
-<div class="section">
-    <ul>
-        <li><a class="link" href="https://sparknlp.org/docs/en/quickstart" target="_blank">Getting Started</a></li>
-        <li><a class="link" href="https://nlp.johnsnowlabs.com/models" target="_blank">Pretrained Models</a></li>
-        <li><a class="link" href="https://github.com/JohnSnowLabs/spark-nlp/tree/master/examples/python/annotation/text/english" target="_blank">Example Notebooks</a></li>
-        <li><a class="link" href="https://sparknlp.org/docs/en/install" target="_blank">Installation Guide</a></li>
-    </ul>
-</div>
-""", unsafe_allow_html=True)

+import streamlit as st
+# Page configuration
+st.set_page_config(
+    layout="wide",
+    initial_sidebar_state="auto"
+)
+# Custom CSS for better styling
+st.markdown("""
+    <style>
+        .main-title {
+            font-size: 36px;
+            color: #4A90E2;
+            font-weight: bold;
+            text-align: center;
+        }
+        .sub-title {
+            font-size: 24px;
+            color: #4A90E2;
+            margin-top: 20px;
+        }
+        .section {
+            background-color: #f9f9f9;
+            padding: 15px;
+            border-radius: 10px;
+            margin-top: 20px;
+        }
+        .section h2 {
+            font-size: 22px;
+            color: #4A90E2;
+        }
+        .section p, .section ul {
+            color: #666666;
+        }
+        .link {
+            color: #4A90E2;
+            text-decoration: none;
+        }
+    </style>
+""", unsafe_allow_html=True)
+# Title
+st.markdown('<div class="main-title">SQL Query Generation</div>', unsafe_allow_html=True)
+# Introduction Section
+st.markdown("""
+<div class="section">
+    <p>SQL Query Generation is a process that involves translating natural language queries into SQL statements. This allows non-technical users to interact with databases by simply asking questions in plain English, making it easier to retrieve specific data without needing to write complex SQL code.</p>
+    <p>In this page, we explore how to implement SQL Query Generation using the T5 model in a Spark NLP pipeline. The T5 model is trained to perform various text-to-text transformations, including translating English queries into SQL commands.</p>
+</div>
+""", unsafe_allow_html=True)
+# T5 Transformer Overview
+st.markdown('<div class="sub-title">T5: The Transformer Architecture</div>', unsafe_allow_html=True)
+st.markdown("""
+<div class="section">
+    <p>The T5 (Text-To-Text Transfer Transformer) model is designed to handle a wide variety of NLP tasks by treating them as text-to-text problems. In the context of SQL Query Generation, T5 takes a natural language input and translates it into an appropriate SQL query.</p>
+    <p>This model is built on the transformer architecture, which uses self-attention mechanisms to process input text, making it highly effective for understanding and generating text. The T5 model is particularly well-suited for tasks where the goal is to generate text that corresponds to a specific function, like generating SQL queries from plain English questions.</p>
+</div>
+""", unsafe_allow_html=True)
+# Performance Section
+st.markdown('<div class="sub-title">Performance and Benchmarks</div>', unsafe_allow_html=True)
+st.markdown("""
+<div class="section">
+    <p>T5 models, including the one used for SQL Query Generation, have been tested on various benchmarks that measure their ability to accurately translate natural language into SQL queries. These models have consistently demonstrated strong performance, accurately generating SQL queries that retrieve the correct data from databases.</p>
+    <p>Performance metrics for T5 in this domain show that it can handle a wide range of query types, from simple lookups to more complex queries involving multiple conditions and joins.</p>
+</div>
+""", unsafe_allow_html=True)
+# Implementation Section
+st.markdown('<div class="sub-title">Implementing SQL Query Generation</div>', unsafe_allow_html=True)
+st.markdown("""
+<div class="section">
+    <p>Below is an example of how to implement SQL Query Generation using a T5 model in a Spark NLP pipeline. This example demonstrates how a natural language query can be transformed into an SQL query that can be executed on a database.</p>
+</div>
+""", unsafe_allow_html=True)
+st.code('''
+from sparknlp.base import *
+from sparknlp.annotator import *
+from pyspark.ml import Pipeline
+from pyspark.sql.functions import col, expr
+documentAssembler = DocumentAssembler() \\
+    .setInputCol("text") \\
+    .setOutputCol("documents")
+t5 = T5Transformer.pretrained("t5_small_wikiSQL") \\
+    .setTask("translate English to SQL:") \\
+    .setInputCols(["documents"]) \\
+    .setMaxOutputLength(200) \\
+    .setOutputCol("sql")
+pipeline = Pipeline().setStages([documentAssembler, t5])
+data = spark.createDataFrame([["How many customers have ordered more than 2 items?"]]).toDF("text")
+result = pipeline.fit(data).transform(data)
+result.select("sql.result").show(truncate=False)
+''', language='python')
+# Example Output
+st.text("""
++----------------------------------------------------+
+|result                                              |
++----------------------------------------------------+
+|[SELECT COUNT Customers FROM table WHERE Orders > 2]|
++----------------------------------------------------+
+""")
+# Model Info Section
+st.markdown('<div class="sub-title">Choosing the Right SQL Model</div>', unsafe_allow_html=True)
+st.markdown("""
+<div class="section">
+    <p>The T5 model used in the example, "t5_small_wikiSQL," is optimized for SQL query generation from English language input. Depending on the complexity of the queries you need to generate, you may choose a larger or more specialized version of the T5 model.</p>
+    <p>For more complex queries or larger datasets, consider using a model with more parameters, such as T5-base or T5-large, which offer improved performance and accuracy. You can explore different models and find the one that best suits your needs on the <a class="link" href="https://sparknlp.org/models?annotator=T5Transformer" target="_blank">Spark NLP Models Hub</a>.</p>
+</div>
+""", unsafe_allow_html=True)
+# Footer
+# References Section
+st.markdown('<div class="sub-title">References</div>', unsafe_allow_html=True)
+st.markdown("""
+<div class="section">
+    <ul>
+        <li><a class="link" href="https://ai.googleblog.com/2019/10/exploring-transfer-learning-with-t5.html" target="_blank">Google AI Blog on T5</a>: Learn more about the T5 model</li>
+        <li><a class="link" href="https://sparknlp.org/models?annotator=T5Transformer" target="_blank">Spark NLP Model Hub</a>: Explore T5 models</li>
+        <li><a class="link" href="https://huggingface.co/models" target="_blank">Hugging Face Model Hub</a>: Explore various NLP models</li>
+        <li><a class="link" href="https://github.com/google-research/text-to-text-transfer-transformer" target="_blank">GitHub</a>: Access the T5 repository and contribute</li>
+        <li><a class="link" href="https://arxiv.org/abs/1910.10683" target="_blank">T5 Paper</a>: Detailed insights from the developers</li>
+    </ul>
+</div>
+""", unsafe_allow_html=True)
+st.markdown('<div class="sub-title">Community & Support</div>', unsafe_allow_html=True)
+st.markdown("""
+<div class="section">
+    <ul>
+        <li><a class="link" href="https://sparknlp.org/" target="_blank">Official Website</a>: Documentation and examples</li>
+        <li><a class="link" href="https://join.slack.com/t/spark-nlp/shared_invite/zt-198dipu77-L3UWNe_AJ8xqDk0ivmih5Q" target="_blank">Slack</a>: Live discussion with the community and team</li>
+        <li><a class="link" href="https://github.com/JohnSnowLabs/spark-nlp" target="_blank">GitHub</a>: Bug reports, feature requests, and contributions</li>
+        <li><a class="link" href="https://medium.com/spark-nlp" target="_blank">Medium</a>: Spark NLP articles</li>
+        <li><a class="link" href="https://www.youtube.com/channel/UCmFOjlpYEhxf_wJUDuz6xxQ/videos" target="_blank">YouTube</a>: Video tutorials</li>
+    </ul>
+</div>
+""", unsafe_allow_html=True)
+st.markdown('<div class="sub-title">Quick Links</div>', unsafe_allow_html=True)
+st.markdown("""
+<div class="section">
+    <ul>
+        <li><a class="link" href="https://sparknlp.org/docs/en/quickstart" target="_blank">Getting Started</a></li>
+        <li><a class="link" href="https://nlp.johnsnowlabs.com/models" target="_blank">Pretrained Models</a></li>
+        <li><a class="link" href="https://github.com/JohnSnowLabs/spark-nlp/tree/master/examples/python/annotation/text/english" target="_blank">Example Notebooks</a></li>
+        <li><a class="link" href="https://sparknlp.org/docs/en/install" target="_blank">Installation Guide</a></li>
+    </ul>
+</div>
+""", unsafe_allow_html=True)