safetensors vs pth file

#162
by agoudarzi - opened

Has anybody tried to load the model from the pth file? I get vastly different behavior if I load the model from pth file.

state = torch.load('tmp/Meta-Llama-3-8B-Instruct/original/consolidated.00.pth')
config = AutoConfig.from_pretrained("tmp/Meta-Llama-3-8B-Instruct/config.json")
model = AutoModelForCausalLM.from_pretrained(pretrained_model_name_or_path='tmp/Meta-Llama-3-8B-Instruct', state_dict = state, config=config)

vs standard

model = AutoModelForCausalLM.from_pretrained('meta-llama/Meta-Llama-3-8B-Instruct',  config=config)

Anybody has seen this?

When the team converts from the original PyTorch format to the transformers format, we do logit checks to ensure all logits have 1 to 1 match. The results could potentially be different due to different generation parameters or structuring the prompt differently

thanks @osanseviero , to be clear, it is not a slight difference though, the output of the pytorch is scrambled text. Is there any example of generation using pytorch format I can look at? Are you saying the decode is potentially different?

For the original checkpoint, you should use the original codebase https://github.com/meta-llama/llama3

osanseviero changed discussion status to closed
Your need to confirm your account before you can post a new comment.

Sign up or log in to comment