Files changed (1) hide show
  1. README.md +98 -84
README.md CHANGED
@@ -1,84 +1,98 @@
1
- ---
2
- library_name: transformers
3
- license: mit
4
- datasets:
5
- - gretelai/synthetic_text_to_sql
6
- base_model:
7
- - Qwen/Qwen2.5-3B-Instruct
8
- pipeline_tag: text-generation
9
- ---
10
- # Fine-Tuned LLM for Text-to-SQL Conversion
11
-
12
- This model is a fine-tuned version of [Qwen/Qwen2.5-3B-Instruct](https://huggingface.co/Qwen/Qwen2.5-3B-Instruct) designed to convert natural language queries into SQL statements. It was trained on the `gretelai/synthetic_text_to_sql` dataset and can provide both SQL queries and table schema context when needed.
13
-
14
- ---
15
-
16
- ## Model Details
17
-
18
- ### Model Description
19
-
20
- This model has been fine-tuned to help users generate SQL queries based on natural language prompts. In scenarios where table schema context is missing, the model is trained to generate schema definitions along with the SQL query, making it a robust solution for various Text-to-SQL tasks.
21
-
22
- - **Base Model:** [Qwen/Qwen2.5-3B-Instruct](https://huggingface.co/Qwen/Qwen2.5-3B-Instruct)
23
- - **Dataset:** [Gretel AI Synthetic Text-to-SQL Dataset](https://huggingface.co/datasets/gretelai/synthetic_text_to_sql)
24
- - **Language:** English
25
- - **License:** MIT
26
-
27
- ### Key Features
28
-
29
- 1. **Text-to-SQL Conversion:** Converts natural language queries into accurate SQL statements.
30
- 2. **Schema Generation:** Generates table schema context when none is provided.
31
- 3. **Optimized for Analytics and Reporting:** Handles SQL queries with aggregation, grouping, and filtering.
32
-
33
- ---
34
-
35
- ## Usage
36
-
37
- ### Direct Use
38
-
39
- To use the model for text-to-SQL conversion, you can load it using the `transformers` library as shown below:
40
-
41
- ```python
42
- from transformers import AutoTokenizer, AutoModelForCausalLM
43
-
44
- tokenizer = AutoTokenizer.from_pretrained("Ellbendls/Qwen-2.5-3b-Text_to_SQL")
45
- model = AutoModelForCausalLM.from_pretrained("Ellbendls/Qwen-2.5-3b-Text_to_SQL")
46
-
47
- # Input prompt
48
- query = "What is the total number of hospital beds in each state?"
49
-
50
- # Tokenize input and generate output
51
- inputs = tokenizer(query, return_tensors="pt")
52
- outputs = model.generate(**inputs, max_length=512)
53
-
54
- # Decode and print
55
- print(tokenizer.decode(outputs[0], skip_special_tokens=True))
56
- ```
57
-
58
- ### Example Output
59
- Input:
60
- `What is the total number of hospital beds in each state?`
61
-
62
- Output:
63
- ```sql
64
- Context:
65
- CREATE TABLE Beds (State VARCHAR(50), Beds INT);
66
- INSERT INTO Beds (State, Beds) VALUES ('California', 100000), ('Texas', 85000), ('New York', 70000);
67
-
68
- SQL Query:
69
- SELECT State, SUM(Beds) FROM Beds GROUP BY State;
70
- ```
71
-
72
- ---
73
-
74
- ## Training Details
75
-
76
- ### Dataset
77
-
78
- The model was fine-tuned on the `gretelai/synthetic_text_to_sql` dataset, which includes diverse natural language queries mapped to SQL queries, with optional schema contexts.
79
-
80
- ## Limitations
81
-
82
- 1. **Complex Queries:** May struggle with highly nested or advanced SQL tasks.
83
- 2. **Non-English Prompts:** Optimized for English only.
84
- 3. **Context Dependence:** May generate incorrect schemas without explicit instructions.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ library_name: transformers
3
+ license: mit
4
+ datasets:
5
+ - gretelai/synthetic_text_to_sql
6
+ base_model:
7
+ - Qwen/Qwen2.5-3B-Instruct
8
+ pipeline_tag: text-generation
9
+ language:
10
+ - zho
11
+ - eng
12
+ - fra
13
+ - spa
14
+ - por
15
+ - deu
16
+ - ita
17
+ - rus
18
+ - jpn
19
+ - kor
20
+ - vie
21
+ - tha
22
+ - ara
23
+ ---
24
+ # Fine-Tuned LLM for Text-to-SQL Conversion
25
+
26
+ This model is a fine-tuned version of [Qwen/Qwen2.5-3B-Instruct](https://huggingface.co/Qwen/Qwen2.5-3B-Instruct) designed to convert natural language queries into SQL statements. It was trained on the `gretelai/synthetic_text_to_sql` dataset and can provide both SQL queries and table schema context when needed.
27
+
28
+ ---
29
+
30
+ ## Model Details
31
+
32
+ ### Model Description
33
+
34
+ This model has been fine-tuned to help users generate SQL queries based on natural language prompts. In scenarios where table schema context is missing, the model is trained to generate schema definitions along with the SQL query, making it a robust solution for various Text-to-SQL tasks.
35
+
36
+ - **Base Model:** [Qwen/Qwen2.5-3B-Instruct](https://huggingface.co/Qwen/Qwen2.5-3B-Instruct)
37
+ - **Dataset:** [Gretel AI Synthetic Text-to-SQL Dataset](https://huggingface.co/datasets/gretelai/synthetic_text_to_sql)
38
+ - **Language:** English
39
+ - **License:** MIT
40
+
41
+ ### Key Features
42
+
43
+ 1. **Text-to-SQL Conversion:** Converts natural language queries into accurate SQL statements.
44
+ 2. **Schema Generation:** Generates table schema context when none is provided.
45
+ 3. **Optimized for Analytics and Reporting:** Handles SQL queries with aggregation, grouping, and filtering.
46
+
47
+ ---
48
+
49
+ ## Usage
50
+
51
+ ### Direct Use
52
+
53
+ To use the model for text-to-SQL conversion, you can load it using the `transformers` library as shown below:
54
+
55
+ ```python
56
+ from transformers import AutoTokenizer, AutoModelForCausalLM
57
+
58
+ tokenizer = AutoTokenizer.from_pretrained("Ellbendls/Qwen-2.5-3b-Text_to_SQL")
59
+ model = AutoModelForCausalLM.from_pretrained("Ellbendls/Qwen-2.5-3b-Text_to_SQL")
60
+
61
+ # Input prompt
62
+ query = "What is the total number of hospital beds in each state?"
63
+
64
+ # Tokenize input and generate output
65
+ inputs = tokenizer(query, return_tensors="pt")
66
+ outputs = model.generate(**inputs, max_length=512)
67
+
68
+ # Decode and print
69
+ print(tokenizer.decode(outputs[0], skip_special_tokens=True))
70
+ ```
71
+
72
+ ### Example Output
73
+ Input:
74
+ `What is the total number of hospital beds in each state?`
75
+
76
+ Output:
77
+ ```sql
78
+ Context:
79
+ CREATE TABLE Beds (State VARCHAR(50), Beds INT);
80
+ INSERT INTO Beds (State, Beds) VALUES ('California', 100000), ('Texas', 85000), ('New York', 70000);
81
+
82
+ SQL Query:
83
+ SELECT State, SUM(Beds) FROM Beds GROUP BY State;
84
+ ```
85
+
86
+ ---
87
+
88
+ ## Training Details
89
+
90
+ ### Dataset
91
+
92
+ The model was fine-tuned on the `gretelai/synthetic_text_to_sql` dataset, which includes diverse natural language queries mapped to SQL queries, with optional schema contexts.
93
+
94
+ ## Limitations
95
+
96
+ 1. **Complex Queries:** May struggle with highly nested or advanced SQL tasks.
97
+ 2. **Non-English Prompts:** Optimized for English only.
98
+ 3. **Context Dependence:** May generate incorrect schemas without explicit instructions.