Finetune Qwen2.5 14b?

by ElvisM - opened Feb 5

Feb 5

For 16gb VRAM cards, I think this one is probably SOTA for story writing, mostly because, unlike Llama and Mistral, it handles long context pretty well. Any plans to fine-tune it one day? Would be very appreciated. I think I speak for most people that I'm kind of tired of having problems when the context reaches 16k tokens.

DavidAU

Owner Feb 6

@ElvisM

This is up next. Very impressed with Qwen 2.5s; however I will tried to emulate this approach (and Brainstorm too):

https://huggingface.co/DavidAU/DeepSeek-Grand-Horror-SMB-R1-Distill-Llama-3.1-16B-GGUF

This takes only the "reasoning/thinking" parts of Deepseek and connects them to the core model.
The core model is fully retained, and augmented with Deepseek's tech.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

Your need to confirm your account before you can post a new comment.

· Sign up or log in to comment