meta-llama/Meta-Llama-3-8B · safetensors vs pth file

May 26, 2024

Has anybody tried to load the model from the pth file? I get vastly different behavior if I load the model from pth file.

state = torch.load('tmp/Meta-Llama-3-8B-Instruct/original/consolidated.00.pth')
config = AutoConfig.from_pretrained("tmp/Meta-Llama-3-8B-Instruct/config.json")
model = AutoModelForCausalLM.from_pretrained(pretrained_model_name_or_path='tmp/Meta-Llama-3-8B-Instruct', state_dict = state, config=config)

vs standard

model = AutoModelForCausalLM.from_pretrained('meta-llama/Meta-Llama-3-8B-Instruct',  config=config)

Anybody has seen this?

osanseviero

May 27, 2024

When the team converts from the original PyTorch format to the transformers format, we do logit checks to ensure all logits have 1 to 1 match. The results could potentially be different due to different generation parameters or structuring the prompt differently

agoudarzi

May 28, 2024

•

edited May 28, 2024

thanks @osanseviero , to be clear, it is not a slight difference though, the output of the pytorch is scrambled text. Is there any example of generation using pytorch format I can look at? Are you saying the decode is potentially different?

osanseviero

May 29, 2024

For the original checkpoint, you should use the original codebase https://github.com/meta-llama/llama3

osanseviero changed discussion status to closed May 29, 2024