Finetune Qwen2.5 14b?

#2
by ElvisM - opened

For 16gb VRAM cards, I think this one is probably SOTA for story writing, mostly because, unlike Llama and Mistral, it handles long context pretty well. Any plans to fine-tune it one day? Would be very appreciated. I think I speak for most people that I'm kind of tired of having problems when the context reaches 16k tokens.

Owner

@ElvisM

This is up next. Very impressed with Qwen 2.5s; however I will tried to emulate this approach (and Brainstorm too):

https://huggingface.co/DavidAU/DeepSeek-Grand-Horror-SMB-R1-Distill-Llama-3.1-16B-GGUF

This takes only the "reasoning/thinking" parts of Deepseek and connects them to the core model.
The core model is fully retained, and augmented with Deepseek's tech.

Your need to confirm your account before you can post a new comment.

Sign up or log in to comment