|
--- |
|
license: apache-2.0 |
|
base_model: |
|
- google/canine-s |
|
--- |
|
|
|
This is an artifact of the SNAILS project. We finetuned google/canine-s to perform the task of word naturalness classification. |
|
Full (unabbreviated) words are "Regular" naturalness (labeled as N1). Somewhat abbreviated words are "Low" naturalness (labeled as N2). |
|
Very abbreviated or indecipherable words are "Least" naturalness (labeled as N3). |
|
|
|
Inference using this model requires a token tagging pre-processing step. This is provided in tokenprocessing.py. |
|
To most easily use this model, download the snails_naturalness_classifier.py and tokenprocessing.py files in this |
|
repository and run snails_naturalness_classifier.py. |
|
|
|
For more information about the SNAILS project and to access the training data: |
|
GitHub repository: https://www.github.com/KyleLuoma/SNAILS |
|
|
|
Read the paper: |
|
https://dl.acm.org/doi/10.1145/3709727 |
|
|
|
Citing this model: |
|
|
|
``` |
|
@article{10.1145/3709727, |
|
author = {Luoma, Kyle and Kumar, Arun}, |
|
title = {SNAILS: Schema Naming Assessments for Improved LLM-Based SQL Inference}, |
|
year = {2025}, |
|
issue_date = {February 2025}, |
|
publisher = {Association for Computing Machinery}, |
|
address = {New York, NY, USA}, |
|
volume = {3}, |
|
number = {1}, |
|
url = {https://doi.org/10.1145/3709727}, |
|
doi = {10.1145/3709727}, |
|
journal = {Proc. ACM Manag. Data}, |
|
month = feb, |
|
articleno = {77}, |
|
numpages = {26}, |
|
} |
|
``` |
|
|
|
|