kyleluoma
/

SNAILS-word-naturalness-classifier

Model card Files Files and versions Community

SNAILS-word-naturalness-classifier / README.md

kyleluoma's picture

Update README.md

df6ab0d verified about 2 months ago

|

history blame contribute delete

1.37 kB

	---
	license: apache-2.0
	base_model:
	- google/canine-s
	---

	This is an artifact of the SNAILS project. We finetuned google/canine-s to perform the task of word naturalness classification.
	Full (unabbreviated) words are "Regular" naturalness (labeled as N1). Somewhat abbreviated words are "Low" naturalness (labeled as N2).
	Very abbreviated or indecipherable words are "Least" naturalness (labeled as N3).

	Inference using this model requires a token tagging pre-processing step. This is provided in tokenprocessing.py.
	To most easily use this model, download the snails_naturalness_classifier.py and tokenprocessing.py files in this
	repository and run snails_naturalness_classifier.py.

	For more information about the SNAILS project and to access the training data:
	GitHub repository: https://www.github.com/KyleLuoma/SNAILS

	Read the paper:
	https://dl.acm.org/doi/10.1145/3709727

	Citing this model:

	```
	@article{10.1145/3709727,
	author = {Luoma, Kyle and Kumar, Arun},
	title = {SNAILS: Schema Naming Assessments for Improved LLM-Based SQL Inference},
	year = {2025},
	issue_date = {February 2025},
	publisher = {Association for Computing Machinery},
	address = {New York, NY, USA},
	volume = {3},
	number = {1},
	url = {https://doi.org/10.1145/3709727},
	doi = {10.1145/3709727},
	journal = {Proc. ACM Manag. Data},
	month = feb,
	articleno = {77},
	numpages = {26},
	}
	```