alialialialaiali commited on
Commit
30581aa
Β·
verified Β·
1 Parent(s): 86fd536

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +34 -17
README.md CHANGED
@@ -7,7 +7,7 @@ tags:
7
  - spider-dataset
8
  - sql-generation
9
  - code-generation
10
- - thesis-research
11
  datasets:
12
  - spider
13
  metrics:
@@ -26,12 +26,13 @@ This model is a fine-tuned version of [Qwen/Qwen2.5-Coder-0.5B](https://huggingf
26
 
27
  This model converts natural language questions into SQL queries by leveraging the Qwen2.5-Coder architecture fine-tuned on the comprehensive Spider dataset. The model demonstrates strong performance on cross-domain semantic parsing tasks and can handle complex SQL constructs including joins, aggregations, and nested queries.
28
 
29
- - **Developed by:** ALI
30
  - **Model type:** Causal Language Model (Text-to-SQL)
31
  - **Language(s):** English
32
  - **License:** Apache 2.0
33
  - **Finetuned from model:** Qwen/Qwen2.5-Coder-0.5B
34
  - **Research Context:** Academic thesis research
 
35
  - **Contact:** [email protected]
36
 
37
  ### Model Sources
@@ -42,12 +43,27 @@ This model converts natural language questions into SQL queries by leveraging th
42
 
43
  ## Performance
44
 
45
- **Execution Accuracy Results (100 Spider Dev samples):**
46
- - **πŸ† Execution Accuracy: 33.0%** (33/100 queries returned correct results)
47
- - **Execution Success Rate: 51.0%** (51/100 queries executed without errors)
48
- - **Parse Errors: 49/100** (remaining queries had syntax issues)
49
 
50
- This represents a significant improvement over base language models for structured SQL generation tasks.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
51
 
52
  ## Uses
53
 
@@ -163,7 +179,7 @@ The model was trained on the Spider dataset, a large-scale cross-domain semantic
163
 
164
  ### Testing Data & Metrics
165
 
166
- **Dataset:** 100 randomly sampled examples from Spider development set
167
 
168
  **Evaluation Method:** Execution Accuracy - measuring whether generated SQL queries return the same results as ground truth when executed on actual Spider databases.
169
 
@@ -174,26 +190,26 @@ The model was trained on the Spider dataset, a large-scale cross-domain semantic
174
 
175
  ### Results Summary
176
 
177
- The model achieved **33% execution accuracy**, demonstrating competent handling of:
178
  - βœ… Multi-table joins with proper aliasing
179
  - βœ… Aggregate functions (COUNT, SUM, AVG) with GROUP BY
180
  - βœ… Set operations (INTERSECT, EXCEPT, UNION)
181
  - βœ… Subqueries and nested SELECT statements
182
  - βœ… Complex WHERE clauses with multiple conditions
183
 
184
- **Performance by Query Complexity:**
185
- - Simple queries (single table): ~60-80% accuracy
186
- - Medium complexity (joins, aggregations): ~30-40% accuracy
187
- - Complex queries (nested subqueries): ~15-25% accuracy
188
 
189
  ## Limitations and Bias
190
 
191
  ### Technical Limitations
192
 
193
- - **Parse errors:** 49% of generated queries contain syntax errors
194
  - **Semantic accuracy:** Model may generate syntactically correct but semantically incorrect queries
195
- - **Complex reasoning:** Performance degrades on highly complex nested queries
196
- - **Schema understanding:** Limited ability to infer implicit relationships
197
 
198
  ### Recommendations
199
 
@@ -201,6 +217,7 @@ The model achieved **33% execution accuracy**, demonstrating competent handling
201
  - **Human review:** Recommend human oversight for production applications
202
  - **Testing:** Thoroughly test on your specific database schema and domain
203
  - **Error handling:** Implement robust error handling for parse failures
 
204
 
205
  ## Environmental Impact
206
 
@@ -228,7 +245,7 @@ Training was conducted on Google Colab infrastructure:
228
  ```bibtex
229
  @inproceedings{yu2018spider,
230
  title={Spider: A Large-Scale Human-Labeled Dataset for Complex and Cross-Domain Semantic Parsing and Text-to-SQL Task},
231
- author={Yu, Tao and Zhang, Rui and Yang, Kai and Yasunaga, Michihiro and Wang, Dongxu and Li, Zifan and Ma, James and Li, Irene and Yao, Qingning and Roman, Shanelle and others},
232
  booktitle={Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing},
233
  pages={3911--3921},
234
  year={2018}
 
7
  - spider-dataset
8
  - sql-generation
9
  - code-generation
10
+ - master-thesis-research
11
  datasets:
12
  - spider
13
  metrics:
 
26
 
27
  This model converts natural language questions into SQL queries by leveraging the Qwen2.5-Coder architecture fine-tuned on the comprehensive Spider dataset. The model demonstrates strong performance on cross-domain semantic parsing tasks and can handle complex SQL constructs including joins, aggregations, and nested queries.
28
 
29
+ - **Developed by:** Ali Assi
30
  - **Model type:** Causal Language Model (Text-to-SQL)
31
  - **Language(s):** English
32
  - **License:** Apache 2.0
33
  - **Finetuned from model:** Qwen/Qwen2.5-Coder-0.5B
34
  - **Research Context:** Academic thesis research
35
+ - **University:** Lebanese University
36
  - **Contact:** [email protected]
37
 
38
  ### Model Sources
 
43
 
44
  ## Performance
45
 
46
+ **Overall Performance Summary**
 
 
 
47
 
48
+ The model achieved a **39.17% execution accuracy** on the Spider development set, correctly generating 405 out of 1,034 SQL queries. While this represents moderate performance, it demonstrates the model's capability to handle basic to intermediate SQL generation tasks across diverse database domains.
49
+
50
+ **Key Performance Metrics:**
51
+
52
+ - **πŸ† Execution Accuracy: 39.17%** (405/1,034 queries returned correct results)
53
+ - **Execution Success Rate: 55.80%** (577/1,034 queries executed without errors)
54
+ - **Parse Error Rate: 44.20%** (457 queries had syntax issues)
55
+ - **Database Error Rate: 0.00%** (no database-related errors when queries parsed correctly)
56
+
57
+ **Key Findings:**
58
+
59
+ The evaluation reveals distinct performance characteristics:
60
+
61
+ **Execution Statistics:**
62
+ - **Success Rate**: 55.80% of queries executed successfully, indicating reasonable SQL syntax generation capability
63
+ - **Parse Errors**: 44.2% of queries failed to parse, highlighting the primary challenge in SQL syntax generation
64
+ - **Database Validity**: 0% database errors suggest that when queries do parse correctly, they are generally semantically valid for the target schemas
65
+
66
+ This pattern indicates the model's main limitation lies in generating syntactically correct SQL rather than logical query construction, suggesting potential for improvement through enhanced syntax training or post-processing validation.
67
 
68
  ## Uses
69
 
 
179
 
180
  ### Testing Data & Metrics
181
 
182
+ **Dataset:** Full Spider development set (1,034 examples)
183
 
184
  **Evaluation Method:** Execution Accuracy - measuring whether generated SQL queries return the same results as ground truth when executed on actual Spider databases.
185
 
 
190
 
191
  ### Results Summary
192
 
193
+ The model achieved **39.17% execution accuracy** on the complete Spider development set, demonstrating competent handling of:
194
  - βœ… Multi-table joins with proper aliasing
195
  - βœ… Aggregate functions (COUNT, SUM, AVG) with GROUP BY
196
  - βœ… Set operations (INTERSECT, EXCEPT, UNION)
197
  - βœ… Subqueries and nested SELECT statements
198
  - βœ… Complex WHERE clauses with multiple conditions
199
 
200
+ **Performance Analysis:**
201
+ - Successfully parsed and executed 55.80% of generated queries
202
+ - Primary challenge identified in SQL syntax generation (44.2% parse errors)
203
+ - When syntactically correct, queries demonstrate strong semantic validity (0% database errors)
204
 
205
  ## Limitations and Bias
206
 
207
  ### Technical Limitations
208
 
209
+ - **Parse errors:** 44.2% of generated queries contain syntax errors, representing the primary performance bottleneck
210
  - **Semantic accuracy:** Model may generate syntactically correct but semantically incorrect queries
211
+ - **Complex reasoning:** Performance likely degrades on highly complex nested queries
212
+ - **Schema understanding:** May have limited ability to infer implicit relationships
213
 
214
  ### Recommendations
215
 
 
217
  - **Human review:** Recommend human oversight for production applications
218
  - **Testing:** Thoroughly test on your specific database schema and domain
219
  - **Error handling:** Implement robust error handling for parse failures
220
+ - **Syntax validation:** Consider implementing SQL syntax validation as post-processing step
221
 
222
  ## Environmental Impact
223
 
 
245
  ```bibtex
246
  @inproceedings{yu2018spider,
247
  title={Spider: A Large-Scale Human-Labeled Dataset for Complex and Cross-Domain Semantic Parsing and Text-to-SQL Task},
248
+ author={Yu, Tao and Zhang, Rui and Yang, Kai and Yasunaga, Michihiro and Wang, Dongxu and Li, Zifan and Ma, James and Li, Irene and Li, Qingning and Roman, Shanelle and others},
249
  booktitle={Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing},
250
  pages={3911--3921},
251
  year={2018}