---
base_model: unsloth/Llama-3.2-1B-Instruct
library_name: peft
license: llama3.2
datasets:
- gretelai/synthetic_text_to_sql
language:
- en
pipeline_tag: text2text-generation
tags:
- SQL
- Text-to-SQL
- SQL-generation
---

# Model Card for Llama3.2-SQL-1B


## Model Details

This model is a fine-tuned version of Llama3.2-1B-Instruct, optimized for text-to-SQL generation tasks.
It was trained on the **gretelai/synthetic_text_to_sql** dataset, which contains synthetic natural language questions and their corresponding SQL queries across a variety of domains.

The model learns to:
- Understand natural language instructions.
- Generate syntactically correct and context-aware SQL queries.
- Interpret structured schema information when included in the prompt.

### Model Description


- **Developed by:** Rustam Shiriyev
- **Model type:** Instruction-tuned model on Text2SQL data
- **Language(s) (NLP):** English
- **License:** Llama3.2
- **Finetuned from model:** unsloth/Llama3.2-1B-Instruct


## Uses


### Direct Use

- Natural Language to SQL translation
- Educational or research applications
- Lightweight inference for SQL query generation on small-scale tasks or apps

## Bias, Risks, and Limitations

- May not handle deeply nested or complex joins in SQL.

## How to Get Started with the Model

```python
from huggingface_hub import login
from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel

login(token="")  

tokenizer = AutoTokenizer.from_pretrained("unsloth/Llama3.2-1B-Instruct",)
base_model = AutoModelForCausalLM.from_pretrained(
    "unsloth/Llama3.2-1B-Instruct",
    device_map={"": 0}, token=""
)

model = PeftModel.from_pretrained(base_model,"Rustamshry/Llama3.2-SQL-1B")


question = "What are the vehicle safety testing organizations that operate in the UK and France?"

context = 
"""
CREATE TABLE SafetyOrgs (name VARCHAR(20), country VARCHAR(10)); 
INSERT INTO SafetyOrgs (name, country) VALUES ('Euro NCAP', 'UK'); 
INSERT INTO SafetyOrgs (name, country) VALUES ('ADAC', 'Germany'); 
INSERT INTO SafetyOrgs (name, country) VALUES ('UTAC', 'France');
"""

instruction = (
    "You are a skilled SQL assistant."
    "Using the given database context, generate the correct SQL query to answer the question.\n\n"
    f"Context: {context.strip()}"
)

prompt = (
    f"### Instruction:\n{instruction}\n\n"
    f"### Question:\n{question}\n\n"
    f"### Response:\n"
)

input_ids = tokenizer(prompt, return_tensors="pt").to(model.device)

outputs = model.generate(
    **input_ids, 
    max_new_tokens=2048
)

print(tokenizer.decode(outputs[0]),skip_special_tokens=True)
```

## Training Details

### Training Data

- **Dataset**: gretelai/synthetic_text_to_sql which consists of 100,000 synthetic examples of natural language questions paired with corresponding SQL queries and explanations.

### Training Procedure

The model was fine-tuned using the Unsloth and LoRA. 

- LoRA rank: 8
- Aplha: 16

#### Training Hyperparameters

- batch size:8, 
- gradient accumulation steps:4,
- optimizer:adamw_torch,
- learning rate:2e-5,
- warmup_steps:100,
- fp16:True,
- epochs:2,
- weight_decay:0.01,
- lr_scheduler_type:linear

#### Speeds, Sizes, Times [optional]

- Training time: 8 hour
- Speed: 0.22 steps/sec

### Results

- Final Loss: 1.42 >> 0.48

### Framework versions

- PEFT 0.14.0