markrodrigo commited on
Commit
a915ad0
·
verified ·
1 Parent(s): 0549a17

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +175 -3
README.md CHANGED
@@ -1,3 +1,175 @@
1
- ---
2
- license: llama3.1
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: llama3.1
3
+ language:
4
+ - en
5
+ pipeline_tag: text2text-generation
6
+ ---
7
+
8
+ ### Model Information
9
+
10
+ This model, Llama-3.1-8B-Instruct-Spatial-SQL-1.0, is an 8B, narrow use case, text to spatial SQL, lightly fine-tuned model. In general, its primary use case
11
+ is the Natural Language command adaptation of particular geographic spatial functions as normally defined in pure SQL. Data input should be a combination of an English prefix in the form of a question, and a coordinate prompt injection, likely from an active mapping system application coordinate list. Output is PostGIS spatial SQL.
12
+
13
+ There are four primary geographic functions released in version 1.0.
14
+
15
+ **Model developer**: Mark Rodrigo
16
+
17
+ **Model Architecture**: The model is a QLoRA / Supervised Fine Tuning (SFT)
18
+
19
+ ### Model Input / Output Overview:
20
+
21
+ Input: Text plus coordinate prompt injection.
22
+ Output: **PostGIS spatial SQL**
23
+ NOTE: Inputs and outputs are in meters and or geographic decimal degrees WGS 84 coordinates.
24
+
25
+ | Function | Question Input | Geo Input | SQL Execution Output |
26
+ |:---------:|:---------------:|:---------:|:-------------------------:|
27
+ | Area | Area question | Polygon | Number - Area sq meters |
28
+ | Centroid | Center question | Polygon | Point |
29
+ | Buffer | Buffer distance | Point | Polygon |
30
+ | Length | Length question | Line | Number - Length in meters |
31
+
32
+ ### Example Prompt / Prompt File
33
+
34
+ <|begin_of_text|><|start_header_id|>system<|end_header_id|>
35
+ <p></p>
36
+ You are a helpful assistant. You are an expert at PostGIS and Postgresql and SQL and psql.
37
+ <p></p>
38
+ <|eot_id|><|start_header_id|>user<|end_header_id|>
39
+
40
+ \### Instruction: Write a PostGIS SQL statement for the following.
41
+ <p></p>
42
+
43
+ \### Input:
44
+
45
+ <p></p>
46
+ {input}
47
+
48
+ <p></p>
49
+
50
+ \### Response:
51
+
52
+ <|eot_id|><|start_header_id|>assistant<|end_header_id|>
53
+
54
+
55
+ ### Examples
56
+
57
+ AREA
58
+ <p></p>
59
+
60
+ \### Input: What is the area for the polygon? : 'Polygon ((-3.7515154 40.3855551, -3.7514972 40.3856581, -3.7507005 40.3855767, -3.7507167 40.3854722, -3.7515154 40.3855551))'
61
+ <p></p>
62
+
63
+ \### Response: SELECT ST_Area(geog) As area FROM (select 'Polygon ((-3.7515154 40.3855551, -3.7514972 40.3856581, -3.7507005 40.3855767, -3.7507167 40.3854722, -3.7515154 40.3855551))' :: geography geog) subquery;
64
+
65
+ <p></p>
66
+ CENTROID
67
+ <p></p>
68
+
69
+ \### Input: What is the centroid for the polygon? : 'Polygon ((-3.6934636 40.4808785, -3.6933352 40.4811486, -3.6930125 40.4810598, -3.693141 40.4807897, -3.6934636 40.4808785))'
70
+ <p></p>
71
+
72
+ \### Response: SELECT ST_AsText(ST_AsText(ST_Centroid(geog))) As centroid FROM (select 'Polygon ((
73
+ -3.6934636 40.4808785, -3.6933352 40.4811486, -3.6930125 40.4810598, -3.693141 40.4807897, -3.6934636 40.4808785))' :: geography geog) subquery;
74
+
75
+ <p></p>
76
+ BUFFER
77
+ <p></p>
78
+
79
+ \### Input: What is the thousand meter buffer for the following point? : 'Point(-8.7522658 41.3862664)'
80
+ <p></p>
81
+
82
+ \### Response: SELECT ST_AsText(ST_Buffer(geog, 1000)) as buffer FROM (select 'Point(-8.7522658 41.3862664)' :: geography geog) subquery;
83
+
84
+ <p></p>
85
+ LENGTH
86
+ <p></p>
87
+
88
+ \### Input: How long is the line? : 'LINESTRING (-3.6976693 40.4263178, -3.6986082 40.4258729)'
89
+ <p></p>
90
+
91
+ \### Response: SELECT ST_Length(geog) As length FROM (select 'LINESTRING (-3.6976693 40.4263178, -3.6986082 40.4258729)' :: geography geog) subquery;
92
+ <p></p>
93
+
94
+
95
+ ### A Few Known Question Variation Examples
96
+
97
+ <p></p>
98
+ AREA
99
+ <p></p>
100
+ What is the area for the geometry?
101
+ <p></p>
102
+ What is the area for this polygon?
103
+ <p></p>
104
+ CENTROID
105
+ <p></p>
106
+ What is the centroid for the geometry?
107
+ <p></p>
108
+ What is the center point of the polygon?
109
+ <p></p>
110
+ BUFFER
111
+ <p></p>
112
+ What is the 100 meter buffer for the following point?
113
+ <p></p>
114
+ Buffer the following point a thousand meters.
115
+ <p></p>
116
+ What is the 1000 meter buffer for the following point?
117
+ <p></p>
118
+ LENGTH
119
+ <p></p>
120
+ What is the length of the line?
121
+ <p></p>
122
+ How long is this line?
123
+
124
+
125
+ ### llama.cpp / Hyperparameter Recommendations For Inference
126
+ max context ~ 8,000 or lower
127
+ <p></p>
128
+ top k ~ 100
129
+ <p></p>
130
+ temp ~ .4-.5 or lower
131
+
132
+ ### Agent Considerations
133
+ Agents are being considered as a separate project. Agents would mostly be related to pulling the coordinates from a mapping UI, and executing the SQL from responses against a PostGIS database.
134
+
135
+ ### Further Reference - link this
136
+ https://postgis.net/docs/manual-3.3/PostGIS_Special_Functions_Index.html#PostGIS_GeographyFunctions
137
+
138
+ ### Evaluation data
139
+ More information needed
140
+
141
+ ### Training data
142
+ Custom synthetic
143
+
144
+
145
+ ### Training hyperparameters
146
+
147
+ The following hyperparameters were used during training:
148
+ - learning_rate: 2e-06
149
+ - train_batch_size: 1
150
+ - eval_batch_size: 1
151
+ - distributed_type: multi-GPU
152
+ - num_devices: 2
153
+ - total_train_batch_size: 100
154
+ - total_eval_batch_size: 10
155
+ - optimizer: Adam 8bit
156
+ - lr_scheduler_type: linear
157
+ - lr_scheduler_warmup_steps: 10
158
+ - num_epochs: 3
159
+
160
+ ### Training results
161
+
162
+ | Training Loss | Epoch | Step | Validation Loss |
163
+ |:-------------:|:------:|:----:|:---------------:|
164
+ | 0.5438 | 1 | 10 | 0.5247 |
165
+ | 0.4889 | 2 | 20 | 0.4494 |
166
+ | 0.4072 | 3 | 30 | 0.4051 |
167
+
168
+
169
+ ### Framework versions
170
+
171
+ - Transformers 4.44.0
172
+ - Pytorch 2.4.0
173
+ - peft 0.12.0
174
+ - Datasets 2.21.0
175
+ - Tokenizers 0.19.1