Little brother(s) of big DeepSeek-R1 ?

#124
by MrDevolver - opened

I know there are small distilled models and I highly appreciate each one of them, but I feel like they could have been much better if they were created from scratch just like the big R1 model. You know, something that would be completely your awesome creation! If I could run the big model locally on my PC, I would. Unfortunately, I can only use the smaller models up to 32B (highly quantized on the high end).

So, could the big model R1 get real little brother(s) - completely DeepSeek at core, please? That would be awesome! ❤

A 40B MoE would be really cool. Still need a Mixtral successor.

A 40B MoE would be really cool. Still need a Mixtral successor.

Still too big for local use on most consumer PCs. I like how Meta does it. Their Llama 3 8B beats Llama 2 13B in both size and quality. Deepseek on the other hand previously released 16B model which runs surprisingly well on my PC. The inference is faster that what I'm getting on much smaller Llama 3 or Qwen 14B.

I want to believe the ultimate goal should be to try to invent such architecture that would allow higher quality on smaller models. My dream is that we will see ~8B models beating ChatGPT 3.5 one day and 14B models reaching even higher (and I mean truly beat those big models, not just in benchmarks but in real world use), making them very competent solution for offline local inference for when you don't have access to the internet.

Sign up or log in to comment