allenai/Llama-3.1-Tulu-3-70B · Reason behind not using special tokens in the prompt format?

Nov 21, 2024

•

edited Nov 21, 2024

Hello, hobbyist model finetuner here. Thanks for sharing your training hyperparameters!

I was just curious if there was a specific reason behind not using dedicated special tokens for role headers in the prompt format (such as the ones already defined in the llama 3 tokenizer, i.e. <|start_header_id|>etc.)?

From the paper it seems like some empiric testing was done - was this also attempted with the tokens above being defined as special?

sszymczyk

Nov 22, 2024

I just found about this and I'm curious as well.

natolambert

Ai2 org Dec 9, 2024

@Doctor-Shotgun and @sszymczyk -- it's because we hard set the chat template in open instruct to be the same for every model. It's not necessarily optimal, but it is a simple approach we've been using for a few years as the goal of our efforts is to easily translate recipe and code to OLMo.

natolambert changed discussion status to closed Dec 9, 2024