pavankumarbalijepalli commited on
Commit
fc7ed7f
·
verified ·
1 Parent(s): e2af9d7

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +141 -108
README.md CHANGED
@@ -1,114 +1,172 @@
1
  ---
2
- library_name: transformers
3
- tags:
4
- - research
5
- - lora
6
- - nl2sql
7
- - sql
8
  license: mit
9
  datasets:
10
  - b-mc2/sql-create-context
11
  language:
12
  - en
 
 
 
 
13
  pipeline_tag: text-generation
 
 
 
 
 
 
 
 
14
  ---
15
 
 
 
16
  # Model Card for Model ID
17
 
18
  <!-- Provide a quick summary of what the model is/does. -->
19
- phi2-nl2sql is a finetuned model of phi-2. The dataset used is `b-mc2/sql-create-context`. This is a base model for the upcoming `gguf` version.
20
 
 
21
 
22
  ## Model Details
23
 
24
  ### Model Description
25
 
26
  <!-- Provide a longer summary of what this model is. -->
 
27
 
28
- This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.
29
-
30
- - **Developed by:** Pavan Kumar Balijepalli
31
- - **Funded by [optional]:** Pavan Kumar Balijepalli
32
- - **Shared by [optional]:** Pavan Kumar Balijepalli
33
- - **Model type:** Casual_LM
34
  - **Language(s) (NLP):** English, SQL
35
  - **License:** MIT
36
- - **Finetuned from model [optional]:** microsoft/phi-2
37
 
38
- ### Model Sources [optional]
39
 
40
  <!-- Provide the basic links for the model. -->
41
 
42
- - **Repository:** [More Information Needed]
43
- - **Paper [optional]:** [More Information Needed]
44
- - **Demo [optional]:** [More Information Needed]
45
 
46
  ## Uses
47
 
48
  <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
49
 
 
 
50
  ### Direct Use
51
 
52
  <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
53
 
54
- [More Information Needed]
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
55
 
56
- ### Downstream Use [optional]
 
 
 
 
 
 
 
57
 
58
  <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
59
 
60
- [More Information Needed]
 
 
 
 
61
 
62
- ### Out-of-Scope Use
63
 
64
- <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
 
 
 
 
65
 
66
- [More Information Needed]
67
 
68
- ## Bias, Risks, and Limitations
 
 
 
69
 
70
- <!-- This section is meant to convey both technical and sociotechnical limitations. -->
 
 
 
71
 
72
- [More Information Needed]
73
 
74
- ### Recommendations
75
 
76
- <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
77
 
78
- Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
79
 
80
- ## How to Get Started with the Model
81
 
82
- Use the code below to get started with the model.
83
 
84
- [More Information Needed]
85
 
86
- ## Training Details
87
 
88
- ### Training Data
89
 
90
- <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
91
 
92
- [More Information Needed]
93
 
94
- ### Training Procedure
95
 
96
- <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
97
 
98
- #### Preprocessing [optional]
99
 
100
- [More Information Needed]
101
 
 
102
 
103
- #### Training Hyperparameters
104
 
105
- - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
106
 
107
- #### Speeds, Sizes, Times [optional]
108
 
109
- <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
110
 
111
- [More Information Needed]
 
 
 
 
 
 
 
 
 
 
 
 
 
 
112
 
113
  ## Evaluation
114
 
@@ -120,33 +178,50 @@ Use the code below to get started with the model.
120
 
121
  <!-- This should link to a Dataset Card if possible. -->
122
 
123
- [More Information Needed]
124
 
125
  #### Factors
126
 
127
  <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
 
128
 
129
- [More Information Needed]
130
 
131
  #### Metrics
132
 
133
  <!-- These are the evaluation metrics being used, ideally with a description of why. -->
134
-
135
- [More Information Needed]
136
-
137
  ### Results
138
 
139
- [More Information Needed]
 
140
 
141
  #### Summary
142
-
143
-
144
-
145
- ## Model Examination [optional]
146
-
147
- <!-- Relevant interpretability work for the model goes here -->
148
-
149
- [More Information Needed]
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
150
 
151
  ## Environmental Impact
152
 
@@ -154,56 +229,14 @@ Use the code below to get started with the model.
154
 
155
  Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
156
 
157
- - **Hardware Type:** [More Information Needed]
158
- - **Hours used:** [More Information Needed]
159
- - **Cloud Provider:** [More Information Needed]
160
- - **Compute Region:** [More Information Needed]
161
- - **Carbon Emitted:** [More Information Needed]
162
-
163
- ## Technical Specifications [optional]
164
-
165
- ### Model Architecture and Objective
166
-
167
- [More Information Needed]
168
-
169
- ### Compute Infrastructure
170
-
171
- [More Information Needed]
172
 
173
- #### Hardware
174
 
175
- [More Information Needed]
176
-
177
- #### Software
178
-
179
- [More Information Needed]
180
-
181
- ## Citation [optional]
182
 
183
  <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
184
 
185
- **BibTeX:**
186
-
187
- [More Information Needed]
188
-
189
- **APA:**
190
-
191
- [More Information Needed]
192
-
193
- ## Glossary [optional]
194
-
195
- <!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
196
-
197
- [More Information Needed]
198
-
199
- ## More Information [optional]
200
-
201
- [More Information Needed]
202
-
203
- ## Model Card Authors [optional]
204
-
205
- [More Information Needed]
206
-
207
- ## Model Card Contact
208
-
209
- [More Information Needed]
 
1
  ---
 
 
 
 
 
 
2
  license: mit
3
  datasets:
4
  - b-mc2/sql-create-context
5
  language:
6
  - en
7
+ metrics:
8
+ - accuracy
9
+ - code_eval
10
+ library_name: transformers
11
  pipeline_tag: text-generation
12
+ tags:
13
+ - peft
14
+ - nl2sql
15
+ widget:
16
+ - text: "### Task\nGenerate a SQL query to answer the following question:\n`How many heads of the departments are older than 56?`\n\n### Database Schema\nThe query will run on a database with the following schema:\nCREATE TABLE head (age INTEGER)\n\n### Answer\nGiven the database schema, here is the SQL query that answers `How many heads of the departments are older than 56?`:\n```sql"
17
+ example_title: "One Table"
18
+ - text: "### Task\nGenerate a SQL query to answer the following question:\n`How many departments are led by heads who are not mentioned?`\n\n### Database Schema\nThe query will run on a database with the following schema:\nCREATE TABLE management (department_id VARCHAR);\nCREATE TABLE department (department_id VARCHAR)\n\n### Answer\nGiven the database schema, here is the SQL query that answers `How many departments are led by heads who are not mentioned?`:\n```sql"
19
+ example_title: "Two Tables"
20
  ---
21
 
22
+ # Thanks for being patient! 💜💜
23
+
24
  # Model Card for Model ID
25
 
26
  <!-- Provide a quick summary of what the model is/does. -->
 
27
 
28
+ A fine-tuned version of Phi-2 for the NL2SQL usecase on `b-mc2/sql-create-context` dataset. __*This contains just the adapters!! You need to load the Phi2 model and add these adapters*__
29
 
30
  ## Model Details
31
 
32
  ### Model Description
33
 
34
  <!-- Provide a longer summary of what this model is. -->
35
+ This model has been finetuned with `b-mc2/sql-create-context` on `microsoft/phi-2`. This performed better than `defog/sqlcoder-7b-2` in terms of inference time and accuracy on the holdback dataset. The evaluation is done on `.gguf` models on CPU machine with limited RAM. The average inference times of the Phi-2, and SQLCoder are 24 secs, and 41 secs respectively. That is 41% faster on average. This is due to its smaller size. The Finetuned Phi-2 is 29% better than the SQLCoder based on execution success. The major drawback is its context window of 2048 tokens which requires additional input engineering to get results.
36
 
37
+ - **Developed by:** pavankumarbalijepalli
38
+ - **Model type:** CASUAL_LM
 
 
 
 
39
  - **Language(s) (NLP):** English, SQL
40
  - **License:** MIT
41
+ - **Finetuned from model:** [microsoft/phi-2](https://huggingface.co/microsoft/phi-2)
42
 
43
+ ### Model Sources
44
 
45
  <!-- Provide the basic links for the model. -->
46
 
47
+ - **Repository:** [pavankumarbalijepalli/pr-phi2-vs-defog](https://github.com/pavankumarbalijepalli/pr-phi2-vs-defog/)
48
+ - **Paper :** [BITS Project Paper](https://github.com/pavankumarbalijepalli/pr-phi2-vs-defog/blob/main/2021SC04115%20-%20Final.pdf)
 
49
 
50
  ## Uses
51
 
52
  <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
53
 
54
+ Model is supposed to be used for the cases where you have a natural language question, database schema which is relevant the question to retrieve a SQL query which answers the question. The context should be below 2048 tokens. The output will be generated in postgresql.
55
+
56
  ### Direct Use
57
 
58
  <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
59
 
60
+ ```python
61
+ # SAME TEMPLATE AS DEFOG MODEL
62
+ prompt = f"""### Task
63
+ Generate a SQL query to answer the following question:
64
+ `{data_point['question']}`
65
+
66
+ ### Database Schema
67
+ The query will run on a database with the following schema:
68
+ {data_point['context']}
69
+
70
+ ### Answer
71
+ Given the database schema, here is the SQL query that answers `{data_point['question']}`:
72
+ ```sql"""
73
+ ```
74
+
75
+ ```python
76
+ # USING ON CPU MACHINE
77
+ from llama_cpp import Llama
78
 
79
+ phi2 = Llama(model_path=f"{path_to_model}/phi2_sqlcoder_f16.gguf")
80
+
81
+ response = phi2(prompt=prompt, max_tokens = 200, temperature = 0.2, stop = ['```'])
82
+
83
+ print(response['choices'][0]['text'].strip())
84
+ ```
85
+
86
+ ### Downstream Use
87
 
88
  <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
89
 
90
+ ```python
91
+ # USING ON GPU MACHINE
92
+ import torch
93
+ from transformers import AutoModelForCausalLM, AutoTokenizer
94
+ # from peft import PeftModel, PeftConfig
95
 
96
+ model_name = "pavankumarbalijepalli/phi2-sqlcoder"
97
 
98
+ model = AutoModelForCausalLM.from_pretrained(
99
+ model_name,
100
+ trust_remote_code=True,
101
+ device_map="auto"
102
+ )
103
 
104
+ prompt = ""
105
 
106
+ tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
107
+ tokenizer.pad_token = tokenizer.eos_token
108
+ inputs = tokenizer(prompt, return_tensors="pt", padding=True, truncation=True)
109
+ inputs.to('cuda')
110
 
111
+ outputs = model.generate(**inputs, max_length=1000)
112
+ text = tokenizer.batch_decode(outputs,skip_special_tokens=True)[0]
113
+ print(text)
114
+ ```
115
 
116
+ ### Out-of-Scope Use
117
 
118
+ <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
119
 
120
+ __Generating Unintended Code:__
121
 
122
+ While the model can translate natural language into SQL queries, it may not be robust enough to handle complex logic or edge cases. Using it to generate critical production code could lead to errors or unexpected behavior in databases.
123
 
124
+ __Security Risks:__
125
 
126
+ NL2SQL models can be susceptible to adversarial attacks where malicious users input natural language designed to trick the model into generating SQL code with security vulnerabilities, like SQL injection attacks.
127
 
128
+ __Beyond its Training Scope:__
129
 
130
+ The model is trained on a specific SQL Language (e.g., PostgreSQL). Using it for a different SQL Syntax (e.g., MS SQL Server) could lead to inaccurate or nonsensical SQL queries.
131
 
132
+ ## Bias, Risks, and Limitations
133
 
134
+ <!-- This section is meant to convey both technical and sociotechnical limitations. -->
135
 
136
+ __Bias and Fairness:__
137
 
138
+ The model's training data may contain biases that are reflected in the generated SQL queries. This could lead to unfair or discriminatory outcomes, especially if the data is not carefully curated.
139
 
140
+ __Interpretability and Explainability:__
141
 
142
+ NL2SQL models are often "black boxes" where it's difficult to understand how they translate natural language to SQL. This lack of interpretability makes it challenging to debug errors or ensure the generated queries are safe and efficient.
143
 
144
+ __Replacing Human Expertise:__
145
 
146
+ While the model can automate some SQL query generation tasks, it shouldn't be a complete replacement for human database administrators or analysts. Understanding the data schema and database design is crucial for writing efficient and secure SQL queries.
147
 
 
148
 
149
+ ### Recommendations
150
 
151
+ <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
152
 
153
+ Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model.
154
 
155
+ ## Training Details
156
+
157
+ ### Training Data
158
+
159
+ <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
160
+
161
+ ```
162
+ @misc{b-mc2_2023_sql-create-context,
163
+ title = {sql-create-context Dataset},
164
+ author = {b-mc2},
165
+ year = {2023},
166
+ url = {https://huggingface.co/datasets/b-mc2/sql-create-context},
167
+ note = {This dataset was created by modifying data from the following sources: \cite{zhongSeq2SQL2017, yu2018spider}.},
168
+ }
169
+ ```
170
 
171
  ## Evaluation
172
 
 
178
 
179
  <!-- This should link to a Dataset Card if possible. -->
180
 
181
+ Used b-mc2/sql-create-context and split the data into training and testing datasets. The holdout dataset is used for testing the model.
182
 
183
  #### Factors
184
 
185
  <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
186
+ The complexity of the questions are calculated using the number of tables per question, number of joins, group by, and sub queries per answer. This complexity is used to prepare the test data by stratifying the split around the complexity.
187
 
 
188
 
189
  #### Metrics
190
 
191
  <!-- These are the evaluation metrics being used, ideally with a description of why. -->
192
+ * __Execution Success:__ This metric is used to find out if the generated query is executable without arising any errors. For this, a sqllite3 connection is made to the memory, and using context the dummy tables are created. Then the predicted SQL is executed. This checks out if the generated query is in proper syntax, and if the model is hallucinating any new columns.
193
+ * __Inference Time:__ This metric is used to find out which model is providing results in less amount of time. This combined with the execution success, gives the efficiency of the model.
194
+ -
195
  ### Results
196
 
197
+ * __Execution Success:__ Finetuned Phi-2 has 29% more success rate than the SQLCoder-7b-2
198
+ * __Inference Time:__ Finetuned Phi-2 has 41% increased inference speed than SQLCoder-7b-2
199
 
200
  #### Summary
201
+ * __Reduced Inference Time and Memory Footprint:__ The fine-tuned Phi-2 model
202
+ demonstrated a reduction in inference time and memory usage compared to the DeFog
203
+ SQLCoder. This is attributed to Phi-2's smaller size and the efficiency of quantization
204
+ techniques employed during fine-tuning. This finding implies that NL2SQL models can
205
+ be deployed on lower-powered devices like laptops or even mobile phones, potentially
206
+ democratizing access to this technology for a wider range of users.
207
+
208
+ * __Competitive Performance on Easy and Medium Queries:__ The fine-tuned Phi-2
209
+ achieved comparable performance to the DeFog SQLCoder in terms of accuracy on easy,
210
+ medium, and hard difficulty queries. This indicates that Phi-2, despite its smaller size,
211
+ can effectively handle a significant portion of real-world NL2SQL tasks, especially for
212
+ simpler queries.
213
+
214
+ * __Challenges with Complex Queries:__ While Phi-2 performed well on easier queries, it
215
+ encountered challenges with complex queries, exhibiting a drop in execution success
216
+ compared to the DeFog SQLCoder. This highlights the trade-off between model size and
217
+ complexity, suggesting that larger models might still be necessary for tackling highly
218
+ intricate tasks.
219
+
220
+ * __Potential for Further Improvement:__ The fine-tuning process employed in this study
221
+ can be further optimized by exploring different hyperparameter configurations and
222
+ potentially investigating alternative fine-tuning techniques like adapter-based methods.
223
+ This optimization has the potential to improve the model's performance on complex
224
+ queries while maintaining its efficiency.
225
 
226
  ## Environmental Impact
227
 
 
229
 
230
  Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
231
 
232
+ - **Hardware Type:** A100 PCIE 40GB X1
233
+ - **Hours used:** 18 Hours
234
+ - **Cloud Provider:** Google Cloud
235
+ - **Compute Region:** Asia-East-1
236
+ - **Carbon Emitted:** 2.52 kg eq. CO2
 
 
 
 
 
 
 
 
 
 
237
 
 
238
 
239
+ ## Citation
 
 
 
 
 
 
240
 
241
  <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
242