michal-stefanik commited on
Commit
9477969
·
1 Parent(s): 7c7f877

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +33 -13
README.md CHANGED
@@ -10,29 +10,50 @@ tags:
10
 
11
  # T5 large LM Adapt for Text to SQL
12
 
13
- This model is fine-tuned from the [t5-large-LM-adapt](https://huggingface.co/google/t5-large-lm-adapt) checkpoint.
14
- While training the model on Text2SQL task, the model learns how to generate a SQL query based on the question posed in natural language, however in some cases the SQL query contains unknown columns etc. and altogether does not take the schema of the specific database into account. That is where our approach comes in. We incorporated the database schema into the input question while training, to specify which columns and relations are available to generate an applicable SQL query.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
15
 
16
  ## Spider and Spider-Syn dataset
17
 
18
  The model was fine-tuned on the training splits of [Spider](https://yale-lily.github.io/spider) and [Spider-Syn](https://github.com/ygan/Spider-Syn/tree/main/Spider-Syn) datasets. Instead of using only the questions, we added the database schema to the question, as we wanted the model to generate a question over a given database
19
 
20
- _input_:
21
 
22
- ```
23
  Question: What is the average, minimum, and maximum age for all French musicians?
24
- Schema: "stadium" "Stadium_ID" int , "Location" text , "Name" text , "Capacity" int , "Highest" int , "Lowest" int , "Average" int , foreign_key: primary key: "Stadium_ID" [SEP] "singer" "Singer_ID" int , "Name" text , "Country" text , "Song_Name" text , "Song_release_year" text , "Age" int , "Is_male" bool , foreign_key: primary key: "Singer_ID" [SEP] "concert" "concert_ID" int , "concert_Name" text , "Theme" text , "Year" text , foreign_key: "Stadium_ID" text from "stadium" "Stadium_ID" , primary key: "concert_ID" [SEP] "singer_in_concert" foreign_key: "concert_ID" int from "concert" "concert_ID" , "Singer_ID" text from "singer" "Singer_ID" , primary key: "concert_ID" "Singer_ID"
 
 
 
 
 
 
 
25
  ```
26
 
27
- => _target_:
28
 
29
- ```
30
  SELECT avg(age), min(age), max(age) FROM singer WHERE country = 'France'
31
  ```
32
 
33
- When evaluating we query the sqlite database
34
- => _query result_:
35
-
36
  ```
37
  [[34.5, 25, 43]]
38
  ```
@@ -69,10 +90,9 @@ output_text = tokenizer.decode(outputs, skip_special_tokens=True)
69
 
70
  print("SQL Query:")
71
  print(output_text)
72
-
73
  ```
74
- returns:
75
- ```bash
76
  SQL Query:
77
  SELECT avg(age), min(age), max(age) FROM singer WHERE country = 'France'
78
  ```
 
10
 
11
  # T5 large LM Adapt for Text to SQL
12
 
13
+ ### Tl;dr
14
+
15
+ This model is purposed to generate structured SQL queries from the natural-language prompts.
16
+
17
+ ### Intro
18
+
19
+ In the Text2SQL task, the model learns how to generate a SQL query based on the question posed in natural language.
20
+ However, in some cases, the SQL query contains unknown columns etc., and altogether does not take the schema of the specific database into account.
21
+
22
+ That is where our approach comes in.
23
+ We incorporated the database schema into the input question while training to specify which columns and relations are available to generate an applicable SQL query.
24
+
25
+ The exposition of database schema, together with the prompt, allows the model to learn the mapping of the schema to the expected output.
26
+ This allows the model to better generalize to the schemas that were not present in the training data.
27
+
28
+ ### Base model
29
+
30
+ We fine-tune this model from the [t5-large-LM-adapt](https://huggingface.co/google/t5-large-lm-adapt) checkpoint.
31
 
32
  ## Spider and Spider-Syn dataset
33
 
34
  The model was fine-tuned on the training splits of [Spider](https://yale-lily.github.io/spider) and [Spider-Syn](https://github.com/ygan/Spider-Syn/tree/main/Spider-Syn) datasets. Instead of using only the questions, we added the database schema to the question, as we wanted the model to generate a question over a given database
35
 
36
+ _Input prompt_:
37
 
38
+ ```python
39
  Question: What is the average, minimum, and maximum age for all French musicians?
40
+ Schema: "stadium" "Stadium_ID" int , "Location" text , "Name" text , "Capacity" int , "Highest" int , "Lowest" int ,
41
+ "Average" int , foreign_key: primary key: "Stadium_ID" [SEP] "singer" "Singer_ID" int , "Name" text , "Country" text ,
42
+ "Song_Name" text , "Song_release_year" text , "Age" int , "Is_male" bool ,
43
+ foreign_key: primary key: "Singer_ID" [SEP],
44
+ "concert" "concert_ID" int , "concert_Name" text , "Theme" text , "Year" text , foreign_key: "Stadium_ID" text from "stadium",
45
+ "Stadium_ID" , primary key: "concert_ID" [SEP] "singer_in_concert",
46
+ foreign_key: "concert_ID" int from "concert",
47
+ "concert_ID" , "Singer_ID" text from "singer" "Singer_ID" , primary key: "concert_ID" "Singer_ID"
48
  ```
49
 
50
+ _Expected output_:
51
 
52
+ ```sql
53
  SELECT avg(age), min(age), max(age) FROM singer WHERE country = 'France'
54
  ```
55
 
56
+ When evaluating the output, we query the _SQLite_ database and get:
 
 
57
  ```
58
  [[34.5, 25, 43]]
59
  ```
 
90
 
91
  print("SQL Query:")
92
  print(output_text)
 
93
  ```
94
+ outputs:
95
+ ```sql
96
  SQL Query:
97
  SELECT avg(age), min(age), max(age) FROM singer WHERE country = 'France'
98
  ```