kaitchup
/

Meta-Llama-3-8B-xLAM-Adapter

Model card Files Files and versions Community

Meta-Llama-3-8B-xLAM-Adapter / README.md

bnjmnmarie's picture

Update README.md

e2e336c verified about 1 year ago

|

history blame contribute delete

1.58 kB

	---
	language:
	- en
	license: cc-by-4.0
	library_name: peft
	datasets:
	- Salesforce/xlam-function-calling-60k
	---

	## Model Details

	This is an adapter for [meta-llama/Meta-Llama-3-8B](https://huggingface.co/meta-llama/Meta-Llama-3-8B) fine-tuned for function calling on xLAM. This adapter is undertrained. Its main purpose is for testing function calling capabilities of LLMs.

	```
	import torch, os
	from peft import PeftModel
	from transformers import (
	AutoModelForCausalLM,
	AutoTokenizer
	)

	#use bf16 and FlashAttention if supported
	if torch.cuda.is_bf16_supported():
	os.system('pip install flash_attn')
	compute_dtype = torch.bfloat16
	attn_implementation = 'flash_attention_2'
	else:
	compute_dtype = torch.float16
	attn_implementation = 'sdpa'

	adapter= "kaitchup/Meta-Llama-3-8B-xLAM-Adapter"
	model_name = "meta-llama/Meta-Llama-3-8B"
	tokenizer = AutoTokenizer.from_pretrained(model_name, use_fast=True)

	model = AutoModelForCausalLM.from_pretrained(
	model_name,
	torch_dtype=compute_dtype,
	device_map={"": 0},
	attn_implementation=attn_implementation,
	)

	model = PeftModel.from_pretrained(model, adapter)

	prompt = "<user>Check if the numbers 8 and 1233 are powers of two.</user>\n\n<tools>"
	inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
	outputs = model.generate(**inputs, do_sample=False, temperature=0.0, max_new_tokens=150)
	result = tokenizer.decode(outputs[0], skip_special_tokens=True)
	print(result)
	```


	- Developed by: [The Kaitchup](https://kaitchup.substack.com/)
	- Language(s) (NLP): English
	- License: cc-by-4.0