safetensors vs pth file
Has anybody tried to load the model from the pth file? I get vastly different behavior if I load the model from pth file.
state = torch.load('tmp/Meta-Llama-3-8B-Instruct/original/consolidated.00.pth')
config = AutoConfig.from_pretrained("tmp/Meta-Llama-3-8B-Instruct/config.json")
model = AutoModelForCausalLM.from_pretrained(pretrained_model_name_or_path='tmp/Meta-Llama-3-8B-Instruct', state_dict = state, config=config)
vs standard
model = AutoModelForCausalLM.from_pretrained('meta-llama/Meta-Llama-3-8B-Instruct', config=config)
Anybody has seen this?
When the team converts from the original PyTorch format to the transformers
format, we do logit checks to ensure all logits have 1 to 1 match. The results could potentially be different due to different generation parameters or structuring the prompt differently
thanks @osanseviero , to be clear, it is not a slight difference though, the output of the pytorch is scrambled text. Is there any example of generation using pytorch format I can look at? Are you saying the decode is potentially different?
For the original checkpoint, you should use the original codebase https://github.com/meta-llama/llama3