I extended this method to create a draft model for the full `deepseek-r1` model
#3
by
jukofyork
- opened
https://huggingface.co/jukofyork/DeepSeek-R1-DRAFT-0.5B
This explains how I extended your idea:
https://github.com/jukofyork/transplant-vocab
to make a draft model that is mostly coherent (at least for speculative sampling) without any fine-tuning.