|
--- |
|
license: mit |
|
language: |
|
- en |
|
metrics: |
|
- accuracy |
|
pipeline_tag: text-generation |
|
tags: |
|
- code |
|
- sql |
|
- text2sql |
|
- instruction_tuned |
|
- basemodel |
|
- jax |
|
- pytorch |
|
datasets: |
|
- PipableAI/spider-bird |
|
--- |
|
# Pipable’s pipSQL |
|
|
|
Pipable’s pipSQL is a model distilled from llama 1b to generate sql queries given prompt and schema. |
|
We used a unique pipeline which involved the model working on two objectives alternatively ---- |
|
1. Maximizing the log prob of all tokens in the sequence (including the prompt tokens) |
|
2. Minimizng the difference between the true value and the predicted maximum value of the output tokens i.e generated tokens for the sql query slice of the entire sequence. |
|
|
|
|
|
|
|
|
|
|
|
## License |
|
|
|
The model's new weights along with all other assets involved with it are open sourced under mit license. |
|
|
|
## How to Use |
|
|
|
```python |
|
text = """<schema>{schema}</schema> |
|
<question>{question}</question> |
|
<sql>""" |
|
``` |
|
|
|
```python |
|
from transformers import AutoModelForCasualLM, AutoTokenizer |
|
device = "cuda" |
|
model = AutoModelForCausalLM.from_pretrained("PipableAI/pipSQL") |
|
tokenizer = AutoTokenizer.from_pretrained("PipableAI/pipSQL") |
|
|
|
inputs = tokenizer(text, return_tensors="pt") |
|
outputs = model.generate(**inputs, max_new_tokens=200) |
|
print(tokenizer.decode(outputs[0], skip_special_tokens=True).split('<sql>')[1].split('</sql>')[0]) |
|
``` |
|
|
|
## The PipableAI team |
|
|
|
Avi Kothari, Pratham Gupta, Ritvik Aryan Kalra, Rohan Bhatial, Soham Acharya , Gyan Ranjan |