I extended this method to create a draft model for the full `deepseek-r1` model

#3
by jukofyork - opened

https://huggingface.co/jukofyork/DeepSeek-R1-DRAFT-0.5B

This explains how I extended your idea:

https://github.com/jukofyork/transplant-vocab

to make a draft model that is mostly coherent (at least for speculative sampling) without any fine-tuning.

Your need to confirm your account before you can post a new comment.

Sign up or log in to comment