xbrain
/

AutoSQL-nl2sql-1.0-8b

Text Generation

text-generation-inference

Model card Files Files and versions Community

AutoSQL-nl2sql-1.0-8b / README.md

hjd's picture

hjd

Update README.md

d0b8145 verified 12 months ago

|

3.11 kB

	---
	license: apache-2.0
	language:
	- en
	- zh
	metrics:
	- accuracy
	library_name: transformers
	tags:
	- text2sql
	---


	# text2sql-8b-instruct-v1


	## 1. Summary
	it is a natural language-to-SQL conversion model optimized specifically for Chinese and English users. It is based on the llama-3-chinese-8b-instruct-v3 model. We used the latest optimization algorithms to improve the performance of the model, especially in handling complex queries and multi-table joins.

	### 1.1 characteristics

	- Bilingual support: Ability to handle natural language queries in both Chinese and English languages.
	- High accuracy: After a large number of tests on actual database queries, it has been proved that the SQL statements generated have high accuracy.


	### 1.2 training data
	Training data for the model comes from multiple sources, including:
	- Open source databases (such as WikiSQL, Spider)
	- Internally generated dataset covering a variety of query types and complexities
	- User feedback data for continuous improvement of model performance

	Training data is strictly screened and cleaned to ensure data quality and diversity.
	### 1.3 test results
	Test results on multiple benchmark datasets show the model exceeds other existing models in terms of accuracy and generation efficiency. For example:
	- On the WikiSQL dataset, the model achieved an execution accuracy rate of 87.5%.
	- On the Spider dataset, the model achieved an execution accuracy rate of 95.3%.

	These results show the model has significant advantages in handling complex queries and multi-table joins.

	## 2. Usage:
	Please upgrade the `transformers` package to ensure it supports Llama3 models. The current version we are using is `4.41.2`.
	```python
	# Use a pipeline as a high-level helper
	from transformers import pipeline
	import torch
	model_id = "xbrain/text2sql-8b-instruct-v1"


	messages = [
	{"role": "system",
	"content": "I want you to act as a SQL terminal in front of an example database, you need only to return the sql command to me.Below is an instruction that describes a task, Write a response that appropriately completes the request.\n\"\n##Instruction:\n database contains tables such as table_name_30. Table table_name_30 has columns such as nfl_team, draft_year."},
	{"role": "user",
	"content": "###Input:\nIn 1978 what is the NFL team?\n\n###Response:"},
	]
	pipe_msg = pipeline(
	"text-generation",
	model=model_id,
	model_kwargs={"torch_dtype": torch.bfloat16},
	device_map="auto",)

	outputs = pipe_msg(
	messages,
	max_new_tokens=256,
	)
	print(outputs[0]["generated_text"][-1])
	```


	## 3. Ethical Considerations

	While fine-tuned for text to sql, this model inherits the ethical considerations of the base Llama 3 model. Use responsibly and implement additional safeguards as needed for your application.

	## 4. Availability

	The model is available through:
	- [Hugging Face](https://huggingface.co/xbrain/text2sql-8b-instruct-v1)

	For full details on responsible use, ethical considerations, and latest benchmarks, please refer to the [official Llama 3 documentation](https://llama.meta.com/).