license: apache-2.0
language:
- en
- zh
metrics:
- accuracy
library_name: transformers
tags:
- text2sql
text2sql-8b-instruct-v1
1. Summary
it is a natural language-to-SQL conversion model optimized specifically for Chinese and English users. It is based on the llama-3-chinese-8b-instruct-v3 model. We used the latest optimization algorithms to improve the performance of the model, especially in handling complex queries and multi-table joins.
1.1 characteristics
- Bilingual support: Ability to handle natural language queries in both Chinese and English languages.
- High accuracy: After a large number of tests on actual database queries, it has been proved that the SQL statements generated have high accuracy.
1.2 training data
Training data for the model comes from multiple sources, including:
- Open source databases (such as WikiSQL, Spider)
- Internally generated dataset covering a variety of query types and complexities
- User feedback data for continuous improvement of model performance
Training data is strictly screened and cleaned to ensure data quality and diversity.
1.3 test results
Test results on multiple benchmark datasets show the model exceeds other existing models in terms of accuracy and generation efficiency. For example:
- On the WikiSQL dataset, the model achieved an execution accuracy rate of 87.5%.
- On the Spider dataset, the model achieved an execution accuracy rate of 95.3%.
These results show the model has significant advantages in handling complex queries and multi-table joins.
2. Usage:
Please upgrade the transformers
package to ensure it supports Llama3 models. The current version we are using is 4.41.2
.
# Use a pipeline as a high-level helper
from transformers import pipeline
import torch
model_id = "xbrain/text2sql-8b-instruct-v1"
messages = [
{"role": "system",
"content": "I want you to act as a SQL terminal in front of an example database, you need only to return the sql command to me.Below is an instruction that describes a task, Write a response that appropriately completes the request.\n\"\n##Instruction:\n database contains tables such as table_name_30. Table table_name_30 has columns such as nfl_team, draft_year."},
{"role": "user",
"content": "###Input:\nIn 1978 what is the NFL team?\n\n###Response:"},
]
pipe_msg = pipeline(
"text-generation",
model=model_id,
model_kwargs={"torch_dtype": torch.bfloat16},
device_map="auto",)
outputs = pipe_msg(
messages,
max_new_tokens=256,
)
print(outputs[0]["generated_text"][-1])
3. Ethical Considerations
While fine-tuned for text to sql, this model inherits the ethical considerations of the base Llama 3 model. Use responsibly and implement additional safeguards as needed for your application.
4. Availability
The model is available through:
For full details on responsible use, ethical considerations, and latest benchmarks, please refer to the official Llama 3 documentation.