File size: 1,365 Bytes
0a95532
 
 
 
2d52530
 
 
 
 
 
 
eae277c
 
caf6281
df6ab0d
5c49707
 
df6ab0d
 
 
5c49707
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
---
license: apache-2.0
base_model:
- google/canine-s
---

This is an artifact of the SNAILS project. We finetuned google/canine-s to perform the task of word naturalness classification.
Full (unabbreviated) words are "Regular" naturalness (labeled as N1). Somewhat abbreviated words are "Low" naturalness (labeled as N2).
Very abbreviated or indecipherable words are "Least" naturalness (labeled as N3).

Inference using this model requires a token tagging pre-processing step. This is provided in tokenprocessing.py.
To most easily use this model, download the snails_naturalness_classifier.py and tokenprocessing.py files in this
repository and run snails_naturalness_classifier.py.

For more information about the SNAILS project and to access the training data:
GitHub repository: https://www.github.com/KyleLuoma/SNAILS

Read the paper:
https://dl.acm.org/doi/10.1145/3709727

Citing this model:

```
@article{10.1145/3709727,
author = {Luoma, Kyle and Kumar, Arun},
title = {SNAILS: Schema Naming Assessments for Improved LLM-Based SQL Inference},
year = {2025},
issue_date = {February 2025},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
volume = {3},
number = {1},
url = {https://doi.org/10.1145/3709727},
doi = {10.1145/3709727},
journal = {Proc. ACM Manag. Data},
month = feb,
articleno = {77},
numpages = {26},
}
```