salmane11 commited on
Commit
693c0e0
·
verified ·
1 Parent(s): 559bb51

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +103 -103
README.md CHANGED
@@ -1,104 +1,104 @@
1
- ---
2
- library_name: transformers
3
- tags:
4
- - text-to-SQL
5
- - SQL
6
- - code-generation
7
- - NLQ-to-SQL
8
- - text2SQL
9
- - Security
10
- - Vulnerability detection
11
- datasets:
12
- - salmane11/SQLShield
13
- language:
14
- - en
15
- base_model:
16
- - google-bert/bert-base-uncased
17
- ---
18
-
19
- # SQLPromptShield
20
-
21
- ## Model Description
22
-
23
- SQLPromptShield is a vulnerable prompt detection model. It classifies text-to-SQL prompts as either vulnerable or benign.
24
-
25
- The checkpoint included in this repository is based on [google-bert/bert-base-uncased](https://huggingface.co/google-bert/bert-base-uncased) and further finetuned on [SQLShield](https://huggingface.co/datasets/salmane11/SQLShield), a dataset dedicated to text-to-SQL vulnerability detection composed of vulnerable and safe NLQs and their related SQL queries.
26
-
27
-
28
- ## Finetuning Procedure
29
- The model was fine-tuned using the Hugging Face Transformers library. The following steps were used:
30
-
31
- 1. Dataset: SQLShield dataset, which consists of labeled pairs of (natural language query, SQL query) with binary classification labels: vulnerable or benign.
32
-
33
- 2. Preprocessing:
34
-
35
- - Input Format: Only the natural language query (NLQ) was used as input for classification.
36
-
37
- - Tokenization: Tokenized using bert-base-uncased tokenizer.
38
-
39
- - Max Length: 128 tokens.
40
-
41
- - Padding and truncation applied.
42
-
43
- ## Intended Use and Limitations
44
-
45
- SQLPromptShield is intended to be used as a pre-processing filter in applications where natural language queries are converted to SQL. Its main goal is to detect potentially malicious or unsafe inputs before they are passed to SQL generation models or database systems.
46
-
47
- Ideal use cases : Natural language interfaces for databases (Text-to-SQL integrated applications)
48
-
49
-
50
- ## How to Use
51
-
52
- Example 1: Malicious
53
-
54
- ```python
55
- from transformers import pipeline
56
-
57
- sql_prompt_shield = pipeline("text-classification", model="salmane11/SQLPromptShield")
58
-
59
- # For the following Table schema
60
- # CREATE TABLE campuses
61
- # (
62
- # campus VARCHAR,
63
- # location VARCHAR
64
- # )
65
-
66
- input_text = "What are the names of all campuses located at ' UNION SELECT database() #?"
67
-
68
- # Text-to-SQL models will generate : SELECT campus FROM campuses WHERE LOCATION = '' UNION SELECT database() #'
69
- # This query when executed will display the database sensitive information like db name and DBMS version
70
-
71
- predicted_label = classifier(input_text)
72
- print(predicted_label)
73
- #{label:"MALICIOUS", probaility:0.9}
74
- ```
75
-
76
-
77
- Example 2: Safe
78
-
79
- ```python
80
- from transformers import pipeline
81
-
82
- sql_prompt_shield = pipeline("text-classification", model="salmane11/SQLPromptShield")
83
-
84
- # For the following Table schema
85
- # CREATE TABLE tv_channel
86
- # (
87
- # package_option VARCHAR,
88
- # series_name VARCHAR
89
- # )
90
-
91
- input_text = "What is the Package Option of TV Channel with serial name 'Sky Radio'?"
92
-
93
- # Text-to-SQL models will generate : SELECT Package_Option FROM TV_Channel WHERE series_name = "Sky Radio"
94
- # Which is a safe query.
95
-
96
- predicted_label = classifier(input_text)
97
- print(predicted_label)
98
- #{label:"SAFE", probaility:0.99}
99
- ```
100
-
101
-
102
- ## Cite our work
103
-
104
  Citation
 
1
+ ---
2
+ library_name: transformers
3
+ tags:
4
+ - text-to-SQL
5
+ - SQL
6
+ - code-generation
7
+ - NLQ-to-SQL
8
+ - text2SQL
9
+ - Security
10
+ - Vulnerability detection
11
+ datasets:
12
+ - salmane11/SQLShield
13
+ language:
14
+ - en
15
+ base_model:
16
+ - google-bert/bert-base-uncased
17
+ ---
18
+
19
+ # SQLPromptShield
20
+
21
+ ## Model Description
22
+
23
+ SQLPromptShield is a vulnerable prompt detection model. It classifies text-to-SQL prompts as either vulnerable or benign.
24
+
25
+ The checkpoint included in this repository is based on [google-bert/bert-base-uncased](https://huggingface.co/google-bert/bert-base-uncased) and further finetuned on [SQLShield](https://huggingface.co/datasets/salmane11/SQLShield), a dataset dedicated to text-to-SQL vulnerability detection composed of vulnerable and safe NLQs and their related SQL queries.
26
+
27
+
28
+ ## Finetuning Procedure
29
+ The model was fine-tuned using the Hugging Face Transformers library. The following steps were used:
30
+
31
+ 1. Dataset: SQLShield dataset, which consists of labeled pairs of (natural language query, SQL query) with binary classification labels: vulnerable or benign.
32
+
33
+ 2. Preprocessing:
34
+
35
+ - Input Format: Only the natural language query (NLQ) was used as input for classification.
36
+
37
+ - Tokenization: Tokenized using bert-base-uncased tokenizer.
38
+
39
+ - Max Length: 128 tokens.
40
+
41
+ - Padding and truncation applied.
42
+
43
+ ## Intended Use and Limitations
44
+
45
+ SQLPromptShield is intended to be used as a pre-processing filter in applications where natural language queries are converted to SQL. Its main goal is to detect potentially malicious or unsafe inputs before they are passed to SQL generation models or database systems.
46
+
47
+ Ideal use cases : Natural language interfaces for databases (Text-to-SQL integrated applications)
48
+
49
+
50
+ ## How to Use
51
+
52
+ Example 1: Malicious
53
+
54
+ ```python
55
+ from transformers import pipeline
56
+
57
+ sql_prompt_shield = pipeline("text-classification", model="salmane11/SQLPromptShield")
58
+
59
+ # For the following Table schema
60
+ # CREATE TABLE campuses
61
+ # (
62
+ # campus VARCHAR,
63
+ # location VARCHAR
64
+ # )
65
+
66
+ input_text = "What are the names of all campuses located at ' UNION SELECT database() #?"
67
+
68
+ # Text-to-SQL models will generate : SELECT campus FROM campuses WHERE LOCATION = '' UNION SELECT database() #'
69
+ # This query when executed will display the database sensitive information like db name and DBMS version
70
+
71
+ predicted_label = sql_prompt_shield(input_text)
72
+ print(predicted_label)
73
+ #[{'label': 'MALICIOUS', 'score': 0.9995930790901184}]
74
+ ```
75
+
76
+
77
+ Example 2: Safe
78
+
79
+ ```python
80
+ from transformers import pipeline
81
+
82
+ sql_prompt_shield = pipeline("text-classification", model="salmane11/SQLPromptShield")
83
+
84
+ # For the following Table schema
85
+ # CREATE TABLE tv_channel
86
+ # (
87
+ # package_option VARCHAR,
88
+ # series_name VARCHAR
89
+ # )
90
+
91
+ input_text = "What is the Package Option of TV Channel with serial name 'Sky Radio'?"
92
+
93
+ # Text-to-SQL models will generate : SELECT Package_Option FROM TV_Channel WHERE series_name = "Sky Radio"
94
+ # Which is a safe query.
95
+
96
+ predicted_label = sql_prompt_shield(input_text)
97
+ print(predicted_label)
98
+ #[{'label': 'SAFE', 'score': 0.998808741569519}]
99
+ ```
100
+
101
+
102
+ ## Cite our work
103
+
104
  Citation