hjd commited on
Commit
139aed4
·
verified ·
1 Parent(s): 234fe5a

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +88 -14
README.md CHANGED
@@ -35,15 +35,23 @@ Test results on multiple benchmark datasets show the model exceeds other existin
35
  - On the WikiSQL dataset, the model achieved an execution accuracy rate of 87.5%.
36
  - On the Spider dataset, the model achieved an execution accuracy rate of 95.3%.
37
 
 
 
 
 
 
 
38
  These results show the model has significant advantages in handling complex queries and multi-table joins.
39
 
40
- ## 2. Usage:
 
 
41
  Please upgrade the `transformers` package to ensure it supports Llama3 models. The current version we are using is `4.41.2`.
42
  ```python
43
  # Use a pipeline as a high-level helper
44
  from transformers import pipeline
45
  import torch
46
- model_id = "xbrain/text2sql-8b-instruct-v1"
47
 
48
 
49
  messages = [
@@ -66,29 +74,95 @@ print(outputs[0]["generated_text"][-1])
66
  ```
67
 
68
  Here are some exciting examples:
 
 
69
 
70
- **Example 1:**
71
- - **model input:**
72
- {"role": "system",
73
  "content": "I want you to act as a SQL terminal in front of an example database, you need only to return the sql command to me.Below is an instruction that describes a task, Write a response that appropriately completes the request.\n\"\n##Instruction:\ndog_kennels contains tables such as Breeds, Charges, Sizes, Treatment_Types, Owners, Dogs, Professionals, Treatments. Table Breeds has columns such as breed_code, breed_name. breed_code is the primary key.\nTable Charges has columns such as charge_id, charge_type, charge_amount. charge_id is the primary key.\nTable Sizes has columns such as size_code, size_description. size_code is the primary key.\nTable Treatment_Types has columns such as treatment_type_code, treatment_type_description. treatment_type_code is the primary key.\nTable Owners has columns such as owner_id, first_name, last_name, street, city, state, zip_code, email_address, home_phone, cell_number. owner_id is the primary key.\nTable Dogs has columns such as dog_id, owner_id, abandoned_yn, breed_code, size_code, name, age, date_of_birth, gender, weight, date_arrived, date_adopted, date_departed. dog_id is the primary key.\nTable Professionals has columns such as professional_id, role_code, first_name, street, city, state, zip_code, last_name, email_address, home_phone, cell_number. professional_id is the primary key.\nTable Treatments has columns such as treatment_id, dog_id, professional_id, treatment_type_code, date_of_treatment, cost_of_treatment. treatment_id is the primary key.\nThe owner_id of Dogs is the foreign key of owner_id of Owners.\nThe owner_id of Dogs is the foreign key of owner_id of Owners.\nThe size_code of Dogs is the foreign key of size_code of Sizes.\nThe breed_code of Dogs is the foreign key of breed_code of Breeds.\nThe dog_id of Treatments is the foreign key of dog_id of Dogs.\nThe professional_id of Treatments is the foreign key of professional_id of Professionals.\nThe treatment_type_code of Treatments is the foreign key of treatment_type_code of Treatment_Types.\n\n"},
74
 
75
- {"role": "user",
76
  "content": "###Input:\n列出每种治疗的费用和相应的治疗类型描述。\n\n###Response:"},
77
 
78
- - **model output:**
79
- {'role': 'assistant', 'content': 'SELECT T1.cost_of_treatment, T2.treatment_type_description FROM Treatments AS T1 JOIN treatment_types AS T2 ON T1.treatment_type_code = T2.treatment_type_code'}
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
80
 
 
 
 
81
 
82
- **Example 2:**
83
- - **model input:**
84
- {"role": "system",
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
85
  "content": "I want you to act as a SQL terminal in front of an example database, you need only to return the sql command to me.Below is an instruction that describes a task, Write a response that appropriately completes the request.\n\"\n##Instruction:\n database contains tables such as medication, vitalperiodic, diagnosis, cost, lab, microlab, intakeoutput, treatment, patient, allergy. Table medication has columns such as medicationid, patientunitstayid, drugname, dosage, routeadmin, drugstarttime, drugstoptime. Table vitalperiodic has columns such as vitalperiodicid, patientunitstayid, temperature, sao2, heartrate, respiration, systemicsystolic, systemicdiastolic, systemicmean, observationtime. Table diagnosis has columns such as diagnosisid, patientunitstayid, diagnosisname, diagnosistime, icd9code. Table cost has columns such as costid, uniquepid, patienthealthsystemstayid, eventtype, eventid, chargetime, cost. Table lab has columns such as labid, patientunitstayid, labname, labresult, labresulttime. Table microlab has columns such as microlabid, patientunitstayid, culturesite, organism, culturetakentime. Table intakeoutput has columns such as intakeoutputid, patientunitstayid, cellpath, celllabel, cellvaluenumeric, intakeoutputtime. Table treatment has columns such as treatmentid, patientunitstayid, treatmentname, treatmenttime. Table patient has columns such as uniquepid, patienthealthsystemstayid, patientunitstayid, gender, age, ethnicity, hospitalid, wardid, admissionheight, admissionweight, dischargeweight, hospitaladmittime, hospitaladmitsource, unitadmittime, unitdischargetime, hospitaldischargetime, hospitaldischargestatus. Table allergy has columns such as allergyid, patientunitstayid, drugname, allergyname, allergytime. \n"},
86
 
87
- {"role": "user",
88
  "content": "###Input:\n did the first urine, catheter specimen microbiology test of patient 031-15666 in the last month show any organism?\n\n###Response:"},
89
 
90
- - **model output:**
91
- {'role': 'assistant', 'content': "SELECT COUNT(*) > 0 FROM microlab WHERE microlab.patientunitstayid IN (SELECT patient.patientunitstayid FROM patient WHERE patient.patienthealthsystemstayid IN (SELECT patient.patienthealthsystemstayid FROM patient WHERE patient.uniquepid = '031-15666')) AND microlab.culturesite = 'urine, catheter specimen' AND DATETIME(microlab.culturetakentime,'start of month') = DATETIME(CURRENT_TIME(),'start of month', '-1 month') ORDER BY microlab.culturetakentime LIMIT 1"}
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
92
 
93
 
94
 
 
35
  - On the WikiSQL dataset, the model achieved an execution accuracy rate of 87.5%.
36
  - On the Spider dataset, the model achieved an execution accuracy rate of 95.3%.
37
 
38
+ ### 1.4 functions and advantages
39
+ 1. **Semantic Understanding**: It can comprehend complex natural language requests and extract key elements such as patient ID, test type, time range, etc.
40
+ 2. **Automated Query Generation**: Generates correct and optimized SQL queries based on the database structure and table relationships, eliminating the need for manual code writing.
41
+ 3. **Handling Complex Conditions**: Manages nested queries, multi-table joins, and complex time conditions, which might be error-prone or time-consuming if done manually.
42
+ 4. **Efficiency**: Quickly generates queries, significantly enhancing productivity, especially in scenarios requiring frequent database queries.
43
+ 5. **High Adaptability**: Suitable for various domains' data querying needs, only requiring the relevant database structure.
44
  These results show the model has significant advantages in handling complex queries and multi-table joins.
45
 
46
+
47
+
48
+ ## 2. Usage and Explanations:
49
  Please upgrade the `transformers` package to ensure it supports Llama3 models. The current version we are using is `4.41.2`.
50
  ```python
51
  # Use a pipeline as a high-level helper
52
  from transformers import pipeline
53
  import torch
54
+ model_id = "xbrain/AutoSQL-nl2sql-1.0-8b"
55
 
56
 
57
  messages = [
 
74
  ```
75
 
76
  Here are some exciting examples:
77
+ ### Example-1:
78
+ **The following are the inputs, outputs and explanations of the model:**
79
 
80
+ #### model input:
81
+
82
+ {"role": "system",
83
  "content": "I want you to act as a SQL terminal in front of an example database, you need only to return the sql command to me.Below is an instruction that describes a task, Write a response that appropriately completes the request.\n\"\n##Instruction:\ndog_kennels contains tables such as Breeds, Charges, Sizes, Treatment_Types, Owners, Dogs, Professionals, Treatments. Table Breeds has columns such as breed_code, breed_name. breed_code is the primary key.\nTable Charges has columns such as charge_id, charge_type, charge_amount. charge_id is the primary key.\nTable Sizes has columns such as size_code, size_description. size_code is the primary key.\nTable Treatment_Types has columns such as treatment_type_code, treatment_type_description. treatment_type_code is the primary key.\nTable Owners has columns such as owner_id, first_name, last_name, street, city, state, zip_code, email_address, home_phone, cell_number. owner_id is the primary key.\nTable Dogs has columns such as dog_id, owner_id, abandoned_yn, breed_code, size_code, name, age, date_of_birth, gender, weight, date_arrived, date_adopted, date_departed. dog_id is the primary key.\nTable Professionals has columns such as professional_id, role_code, first_name, street, city, state, zip_code, last_name, email_address, home_phone, cell_number. professional_id is the primary key.\nTable Treatments has columns such as treatment_id, dog_id, professional_id, treatment_type_code, date_of_treatment, cost_of_treatment. treatment_id is the primary key.\nThe owner_id of Dogs is the foreign key of owner_id of Owners.\nThe owner_id of Dogs is the foreign key of owner_id of Owners.\nThe size_code of Dogs is the foreign key of size_code of Sizes.\nThe breed_code of Dogs is the foreign key of breed_code of Breeds.\nThe dog_id of Treatments is the foreign key of dog_id of Dogs.\nThe professional_id of Treatments is the foreign key of professional_id of Professionals.\nThe treatment_type_code of Treatments is the foreign key of treatment_type_code of Treatment_Types.\n\n"},
84
 
85
+ {"role": "user",
86
  "content": "###Input:\n列出每种治疗的费用和相应的治疗类型描述。\n\n###Response:"},
87
 
88
+ #### model output:
89
+ {'role': 'assistant', 'content': 'SELECT T1.cost_of_treatment, T2.treatment_type_description FROM Treatments AS T1 JOIN treatment_types AS T2 ON T1.treatment_type_code = T2.treatment_type_code'}
90
+
91
+
92
+ #### Explanations:
93
+ In this specific example, the role and functionality of the model can be detailed as follows:
94
+
95
+ **Understanding Natural Language Input**
96
+
97
+ The user's natural language input is:
98
+ ```
99
+ 列出每种治疗的费用和相应的治疗类型描述
100
+ ```
101
+
102
+ The model first needs to understand the meaning of this Chinese description, identifying that the user wants to retrieve data about "the cost of each treatment" and "the corresponding treatment type description."
103
+
104
+ **Mapping to Database Structure**
105
 
106
+ Next, the model maps the user's requirements to specific tables and columns in the database. Given the defined structure of the database, the model identifies the relevant tables and columns:
107
+ - **Table Treatments** includes columns like `treatment_id`, `dog_id`, `professional_id`, `treatment_type_code`, `date_of_treatment`, `cost_of_treatment`, etc.
108
+ - **Table Treatment_Types** includes columns like `treatment_type_code`, `treatment_type_description`, etc.
109
 
110
+ Based on the requirement, "the cost of each treatment" corresponds to the `cost_of_treatment` column in the `Treatments` table, and "the corresponding treatment type description" corresponds to the `treatment_type_description` column in the `Treatment_Types` table.
111
+
112
+ **Generating the SQL Query**
113
+
114
+ The model then generates the SQL query to fulfill the user's request. The specific SQL statement generated is:
115
+ ```sql
116
+ SELECT T1.cost_of_treatment, T2.treatment_type_description
117
+ FROM Treatments AS T1
118
+ JOIN Treatment_Types AS T2
119
+ ON T1.treatment_type_code = T2.treatment_type_code;
120
+ ```
121
+
122
+
123
+ The model can handle not just simple queries but also complex ones involving multiple tables through JOIN operations, meeting diverse data requirements.
124
+
125
+
126
+ ### Example-2:
127
+ **The following are the inputs, outputs and explanations of the model:**
128
+ #### model input:
129
+ {"role": "system",
130
  "content": "I want you to act as a SQL terminal in front of an example database, you need only to return the sql command to me.Below is an instruction that describes a task, Write a response that appropriately completes the request.\n\"\n##Instruction:\n database contains tables such as medication, vitalperiodic, diagnosis, cost, lab, microlab, intakeoutput, treatment, patient, allergy. Table medication has columns such as medicationid, patientunitstayid, drugname, dosage, routeadmin, drugstarttime, drugstoptime. Table vitalperiodic has columns such as vitalperiodicid, patientunitstayid, temperature, sao2, heartrate, respiration, systemicsystolic, systemicdiastolic, systemicmean, observationtime. Table diagnosis has columns such as diagnosisid, patientunitstayid, diagnosisname, diagnosistime, icd9code. Table cost has columns such as costid, uniquepid, patienthealthsystemstayid, eventtype, eventid, chargetime, cost. Table lab has columns such as labid, patientunitstayid, labname, labresult, labresulttime. Table microlab has columns such as microlabid, patientunitstayid, culturesite, organism, culturetakentime. Table intakeoutput has columns such as intakeoutputid, patientunitstayid, cellpath, celllabel, cellvaluenumeric, intakeoutputtime. Table treatment has columns such as treatmentid, patientunitstayid, treatmentname, treatmenttime. Table patient has columns such as uniquepid, patienthealthsystemstayid, patientunitstayid, gender, age, ethnicity, hospitalid, wardid, admissionheight, admissionweight, dischargeweight, hospitaladmittime, hospitaladmitsource, unitadmittime, unitdischargetime, hospitaldischargetime, hospitaldischargestatus. Table allergy has columns such as allergyid, patientunitstayid, drugname, allergyname, allergytime. \n"},
131
 
132
+ {"role": "user",
133
  "content": "###Input:\n did the first urine, catheter specimen microbiology test of patient 031-15666 in the last month show any organism?\n\n###Response:"},
134
 
135
+ #### model output:
136
+ {'role': 'assistant', 'content': "SELECT COUNT(*) > 0 FROM microlab WHERE microlab.patientunitstayid IN (SELECT patient.patientunitstayid FROM patient WHERE patient.patienthealthsystemstayid IN (SELECT patient.patienthealthsystemstayid FROM patient WHERE patient.uniquepid = '031-15666')) AND microlab.culturesite = 'urine, catheter specimen' AND DATETIME(microlab.culturetakentime,'start of month') = DATETIME(CURRENT_TIME(),'start of month', '-1 month') ORDER BY microlab.culturetakentime LIMIT 1"}
137
+
138
+ #### Explanations:
139
+
140
+ **Understanding Natural Language Input:**
141
+ In this example, the model understood the user's intention, and generated the following SQL query based on the given database structure:
142
+
143
+ ```sql
144
+ SELECT COUNT(*) > 0
145
+ FROM microlab
146
+ WHERE microlab.patientunitstayid IN (
147
+ SELECT patient.patientunitstayid
148
+ FROM patient
149
+ WHERE patient.patienthealthsystemstayid IN (
150
+ SELECT patient.patienthealthsystemstayid
151
+ FROM patient
152
+ WHERE patient.uniquepid = '031-15666'
153
+ )
154
+ )
155
+ AND microlab.culturesite = 'urine, catheter specimen'
156
+ AND DATETIME(microlab.culturetakentime,'start of month') = DATETIME(CURRENT_TIME(),'start of month', '-1 month')
157
+ ORDER BY microlab.culturetakentime
158
+ LIMIT 1;
159
+ ```
160
+
161
+ **Specific Performance**
162
+ - **Identifying Objects**: The model identified that the patient of interest is "031-15666".
163
+ - **Detail Extraction**: Understood that the user wants to check the microbiology test for "urine, catheter specimen".
164
+ - **Time Filtering**: Recognized that the user is interested in tests conducted "in the last month" and performed the necessary time conversion.
165
+ - **Result Limitation**: Output whether the corresponding record exists (COUNT(*) > 0), and only looked for the first matching record (LIMIT 1).
166
 
167
 
168