alialialialaiali
/

qwen2.5-coder-spider-sql

@@ -7,7 +7,7 @@ tags:
 - spider-dataset
 - sql-generation
 - code-generation
-- thesis-research
 datasets:
 - spider
 metrics:
@@ -26,12 +26,13 @@ This model is a fine-tuned version of [Qwen/Qwen2.5-Coder-0.5B](https://huggingf
 This model converts natural language questions into SQL queries by leveraging the Qwen2.5-Coder architecture fine-tuned on the comprehensive Spider dataset. The model demonstrates strong performance on cross-domain semantic parsing tasks and can handle complex SQL constructs including joins, aggregations, and nested queries.
-- **Developed by:** ALI
 - **Model type:** Causal Language Model (Text-to-SQL)
 - **Language(s):** English
 - **License:** Apache 2.0
 - **Finetuned from model:** Qwen/Qwen2.5-Coder-0.5B
 - **Research Context:** Academic thesis research
 - **Contact:** [email protected]
 ### Model Sources
@@ -42,12 +43,27 @@ This model converts natural language questions into SQL queries by leveraging th
 ## Performance
-**Execution Accuracy Results (100 Spider Dev samples):**
-- **🏆 Execution Accuracy: 33.0%** (33/100 queries returned correct results)
-- **Execution Success Rate: 51.0%** (51/100 queries executed without errors)
-- **Parse Errors: 49/100** (remaining queries had syntax issues)
-This represents a significant improvement over base language models for structured SQL generation tasks.
 ## Uses
@@ -163,7 +179,7 @@ The model was trained on the Spider dataset, a large-scale cross-domain semantic
 ### Testing Data & Metrics
-**Dataset:** 100 randomly sampled examples from Spider development set
 **Evaluation Method:** Execution Accuracy - measuring whether generated SQL queries return the same results as ground truth when executed on actual Spider databases.
@@ -174,26 +190,26 @@ The model was trained on the Spider dataset, a large-scale cross-domain semantic
 ### Results Summary
-The model achieved **33% execution accuracy**, demonstrating competent handling of:
 - ✅ Multi-table joins with proper aliasing
 - ✅ Aggregate functions (COUNT, SUM, AVG) with GROUP BY
 - ✅ Set operations (INTERSECT, EXCEPT, UNION)
 - ✅ Subqueries and nested SELECT statements
 - ✅ Complex WHERE clauses with multiple conditions
-**Performance by Query Complexity:**
-- Simple queries (single table): ~60-80% accuracy
-- Medium complexity (joins, aggregations): ~30-40% accuracy
-- Complex queries (nested subqueries): ~15-25% accuracy
 ## Limitations and Bias
 ### Technical Limitations
-- **Parse errors:** 49% of generated queries contain syntax errors
 - **Semantic accuracy:** Model may generate syntactically correct but semantically incorrect queries
-- **Complex reasoning:** Performance degrades on highly complex nested queries
-- **Schema understanding:** Limited ability to infer implicit relationships
 ### Recommendations
@@ -201,6 +217,7 @@ The model achieved **33% execution accuracy**, demonstrating competent handling
 - **Human review:** Recommend human oversight for production applications
 - **Testing:** Thoroughly test on your specific database schema and domain
 - **Error handling:** Implement robust error handling for parse failures
 ## Environmental Impact
@@ -228,7 +245,7 @@ Training was conducted on Google Colab infrastructure:
 ```bibtex
 @inproceedings{yu2018spider,
   title={Spider: A Large-Scale Human-Labeled Dataset for Complex and Cross-Domain Semantic Parsing and Text-to-SQL Task},
-  author={Yu, Tao and Zhang, Rui and Yang, Kai and Yasunaga, Michihiro and Wang, Dongxu and Li, Zifan and Ma, James and Li, Irene and Yao, Qingning and Roman, Shanelle and others},
   booktitle={Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing},
   pages={3911--3921},
   year={2018}

 - spider-dataset
 - sql-generation
 - code-generation
+- master-thesis-research
 datasets:
 - spider
 metrics:
 This model converts natural language questions into SQL queries by leveraging the Qwen2.5-Coder architecture fine-tuned on the comprehensive Spider dataset. The model demonstrates strong performance on cross-domain semantic parsing tasks and can handle complex SQL constructs including joins, aggregations, and nested queries.
+- **Developed by:** Ali  Assi
 - **Model type:** Causal Language Model (Text-to-SQL)
 - **Language(s):** English
 - **License:** Apache 2.0
 - **Finetuned from model:** Qwen/Qwen2.5-Coder-0.5B
 - **Research Context:** Academic thesis research
+- **University:** Lebanese University
 - **Contact:** [email protected]
 ### Model Sources
 ## Performance
+**Overall Performance Summary**
+The model achieved a **39.17% execution accuracy** on the Spider development set, correctly generating 405 out of 1,034 SQL queries. While this represents moderate performance, it demonstrates the model's capability to handle basic to intermediate SQL generation tasks across diverse database domains.
+**Key Performance Metrics:**
+- **🏆 Execution Accuracy: 39.17%** (405/1,034 queries returned correct results)
+- **Execution Success Rate: 55.80%** (577/1,034 queries executed without errors)
+- **Parse Error Rate: 44.20%** (457 queries had syntax issues)
+- **Database Error Rate: 0.00%** (no database-related errors when queries parsed correctly)
+**Key Findings:**
+The evaluation reveals distinct performance characteristics:
+**Execution Statistics:**
+- **Success Rate**: 55.80% of queries executed successfully, indicating reasonable SQL syntax generation capability
+- **Parse Errors**: 44.2% of queries failed to parse, highlighting the primary challenge in SQL syntax generation
+- **Database Validity**: 0% database errors suggest that when queries do parse correctly, they are generally semantically valid for the target schemas
+This pattern indicates the model's main limitation lies in generating syntactically correct SQL rather than logical query construction, suggesting potential for improvement through enhanced syntax training or post-processing validation.
 ## Uses
 ### Testing Data & Metrics
+**Dataset:** Full Spider development set (1,034 examples)
 **Evaluation Method:** Execution Accuracy - measuring whether generated SQL queries return the same results as ground truth when executed on actual Spider databases.
 ### Results Summary
+The model achieved **39.17% execution accuracy** on the complete Spider development set, demonstrating competent handling of:
 - ✅ Multi-table joins with proper aliasing
 - ✅ Aggregate functions (COUNT, SUM, AVG) with GROUP BY
 - ✅ Set operations (INTERSECT, EXCEPT, UNION)
 - ✅ Subqueries and nested SELECT statements
 - ✅ Complex WHERE clauses with multiple conditions
+**Performance Analysis:**
+- Successfully parsed and executed 55.80% of generated queries
+- Primary challenge identified in SQL syntax generation (44.2% parse errors)
+- When syntactically correct, queries demonstrate strong semantic validity (0% database errors)
 ## Limitations and Bias
 ### Technical Limitations
+- **Parse errors:** 44.2% of generated queries contain syntax errors, representing the primary performance bottleneck
 - **Semantic accuracy:** Model may generate syntactically correct but semantically incorrect queries
+- **Complex reasoning:** Performance likely degrades on highly complex nested queries
+- **Schema understanding:** May have limited ability to infer implicit relationships
 ### Recommendations
 - **Human review:** Recommend human oversight for production applications
 - **Testing:** Thoroughly test on your specific database schema and domain
 - **Error handling:** Implement robust error handling for parse failures
+- **Syntax validation:** Consider implementing SQL syntax validation as post-processing step
 ## Environmental Impact
 ```bibtex
 @inproceedings{yu2018spider,
   title={Spider: A Large-Scale Human-Labeled Dataset for Complex and Cross-Domain Semantic Parsing and Text-to-SQL Task},
+  author={Yu, Tao and Zhang, Rui and Yang, Kai and Yasunaga, Michihiro and Wang, Dongxu and Li, Zifan and Ma, James and Li, Irene and Li, Qingning and Roman, Shanelle and others},
   booktitle={Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing},
   pages={3911--3921},
   year={2018}