WAIDWML - What Am I Doing With My Life?
(8 Phi-4s in a trenchcoat)
Rationale
So there I was, finding some inspiration to tune stuff but lacking the disposable funds to do anything with the larger models. Enter Phi-4, a model designed for productivity... Initially it was just a going to be a sequential series of finetunes, starting from the baseline Phi-4 and gradually adding more datasets until I either got bored or it got good, but then I had an idea; what if I just MoE'd it?
Yeah.
As a proof of concept, this wasn't too bad. The end result is... interesting, to say the least.
Training
As mentioned above, this was done in "phases", each with a separate dataset. Most were done with a max_seq_length
of 32k, a few of them were dropped to 16k to make sure they fit in the hardware.
lr
was all over the place but in general somewhere between 1e-5
and 4e-6
. These were all separate LoRAs using r=64
and alpha=32
with rsLoRA enabled. epochs
were 2 or 3 for everything except c2
, as that'd take far too long.
p1
: Private RP dataset (RPT-Varied-Small
)p2
:TheDrummer/AmoralQA-v2
p3
:AIRRC/Eudaimonic
p4
: Two private RP datasets (cc-gpt4-sfw-sharegpt
&cc-gpt4-nsfw-sharegpt
)p5
: A random subset of the infamous "c2
"-logs dataset, cleaned and deduped (approx. 30%)p6
: Private RP dataset (RPT-Varied-Small_v1.5
)p7
:NewEden/PIPPA-Mega-Filtered
p8
:Squish42/bluemoon-fandom-1-1-rp-cleaned
(Note: the RPT-Varied-Small
and RPT-Varied-Small_v1.5
datasets are due to be released after I manually verify their fitness.)
Once all LoRAs were trained, I separately merged them into the base model then I used mergekit (config) to "merge" them into a MoE. I chose to initialize the router randomly as I was going to training that part later. After that, I trained the routing layers for 8 epochs with lr = 1e-6
and grimulkan/LimaRP-augmented
as the dataset. It took roughly 8.5 hours on a 6xA40 instance on RunPod.
Recommended Settings
Phi-4 format. What I used for my tests:
- Temp 1
- minP 0.05
FAQ
Q: Why not do anything constructive, like GRPO-tune a model of usable size?
A: Where's the fun in that?
Q: Are you, like, okay?
A: Objectively? Probably not. Subjectively? Never better.
Q: You know this still sucks for RP, right?
A: Yup. Should have pivoted to reasoning and code once R1 hit, but sunk cost and all kept me on this trajectory.
- Downloads last month
- 14