|
--- |
|
language: |
|
- en |
|
license: cc-by-4.0 |
|
library_name: peft |
|
datasets: |
|
- Salesforce/xlam-function-calling-60k |
|
--- |
|
|
|
## Model Details |
|
|
|
This is an adapter for [meta-llama/Meta-Llama-3-8B](https://huggingface.co/meta-llama/Meta-Llama-3-8B) fine-tuned for function calling on xLAM. This adapter is undertrained. Its main purpose is for testing function calling capabilities of LLMs. |
|
|
|
``` |
|
import torch, os |
|
from peft import PeftModel |
|
from transformers import ( |
|
AutoModelForCausalLM, |
|
AutoTokenizer |
|
) |
|
|
|
#use bf16 and FlashAttention if supported |
|
if torch.cuda.is_bf16_supported(): |
|
os.system('pip install flash_attn') |
|
compute_dtype = torch.bfloat16 |
|
attn_implementation = 'flash_attention_2' |
|
else: |
|
compute_dtype = torch.float16 |
|
attn_implementation = 'sdpa' |
|
|
|
adapter= "kaitchup/Meta-Llama-3-8B-xLAM-Adapter" |
|
model_name = "meta-llama/Meta-Llama-3-8B" |
|
tokenizer = AutoTokenizer.from_pretrained(model_name, use_fast=True) |
|
|
|
model = AutoModelForCausalLM.from_pretrained( |
|
model_name, |
|
torch_dtype=compute_dtype, |
|
device_map={"": 0}, |
|
attn_implementation=attn_implementation, |
|
) |
|
|
|
model = PeftModel.from_pretrained(model, adapter) |
|
|
|
prompt = "<user>Check if the numbers 8 and 1233 are powers of two.</user>\n\n<tools>" |
|
inputs = tokenizer(prompt, return_tensors="pt").to("cuda") |
|
outputs = model.generate(**inputs, do_sample=False, temperature=0.0, max_new_tokens=150) |
|
result = tokenizer.decode(outputs[0], skip_special_tokens=True) |
|
print(result) |
|
``` |
|
|
|
|
|
- **Developed by:** [The Kaitchup](https://kaitchup.substack.com/) |
|
- **Language(s) (NLP):** English |
|
- **License:** cc-by-4.0 |