Using vllm to infer 'Llama3-ChatQA-1.5-70B', it will continue to be generated when encountering the special token '<|im_end|>', as shown in the figure below. This PR adds <|im_end|> to the tokenizer, and you need to add mapping to generation_config.json.
Ready to merge
This branch is ready to get merged automatically.
Your need to confirm your account before you can post a new comment.