Can you distill qwen-2.5-72b?

#30
by xldistance - opened

Can you distill qwen-2.5-72b?

S/he meant of course distilling DeepSeek R1 using Qwen2.5 72B as the student model (if you still have some GPUs left for that:)
(We only have the smaller-sized Qwen2.5 32B used in that role, but not 72B, which may explain why that distillation using Qwen 32B was beaten on many benchmarks by the distillation using the 100% larger 70B Llama 3.x as the student model)

Sign up or log in to comment