crumb
/

test-00-switchllama-i3b-f10b-e4-init

Text Generation

Model card Files Files and versions Community

crumb commited on Sep 7, 2023

Commit

7b7d647

·

1 Parent(s): f1ea0c8

Update README.md

Files changed (1) hide show

README.md +3 -1

README.md CHANGED Viewed

@@ -10,4 +10,6 @@ tags:
 - moe
 - lora
 ---
-This is the very first testing switchllama model from MoLora2, starting from OpenLlama-3b-v2 and adding 4 experts in the MLP blocks of the model. The experts were trained with QLora and merged properly (in 4bit) after individually training adapters on `gate_proj, up_proj, down_proj`. The 4 expert models were trained on clusters from [crumb/Wizard-EvolInstruct70k-k4](https://huggingface.co/datasets/crumb/Wizard-EvolInstruct70k-k4) then their trained MLP weights were taken and transplanted in a model initialized from OpenLlama-3b with 4 switchtransformer experts. The routers are not trained in this version of the model and are randomly initialized.

 - moe
 - lora
 ---
+This is the very first testing switchllama model from MoLora2, starting from OpenLlama-3b-v2 and adding 4 experts in the MLP blocks of the model. The experts were trained with QLora and merged properly (in 4bit) after individually training adapters on `gate_proj, up_proj, down_proj`. The 4 expert models were trained on clusters from [crumb/Wizard-EvolInstruct70k-k4](https://huggingface.co/datasets/crumb/Wizard-EvolInstruct70k-k4) then their trained MLP weights were taken and transplanted in a model initialized from OpenLlama-3b with 4 switchtransformer experts. The routers are not trained in this version of the model and are randomly initialized.
+Modeling code is not included until this proof-of-concept is entirely trained.