How does this work? Thanks!

#1
by YaTharThShaRma999 - opened

It seems really intresting but how does it work? You did give a description but I didn’t really understand it. Could you explain it a bit further but a bit simpler? Thanks anyway!

Basically it's an encoder-decoder model made from a pretrained GPT-N (here, Mistral 7B). In the first phase it is trained to encode and decode a 64 token sentence. In the next phase I train a main "router head" which reconstructs 64 tokens from an embedding and then predicts the next 64 tokens. You can then use the resulting model to guide sampling, encode text for later retrieval, etc.

Ah, ok. Thanks for explaining it now!

Your need to confirm your account before you can post a new comment.

Sign up or log in to comment