crumb
/

test-00-switchllama-i3b-f10b-e4-init

Text Generation

Model card Files Files and versions Community

crumb commited on Sep 7, 2023

Commit

f3df309

·

1 Parent(s): 7b7d647

Update README.md

Files changed (1) hide show

README.md +2 -4

README.md CHANGED Viewed

@@ -4,11 +4,9 @@ datasets:
 language:
 - en
 tags:
-- llama
-- openllama
 - switchtransformer
-- moe
-- lora
 ---
 This is the very first testing switchllama model from MoLora2, starting from OpenLlama-3b-v2 and adding 4 experts in the MLP blocks of the model. The experts were trained with QLora and merged properly (in 4bit) after individually training adapters on `gate_proj, up_proj, down_proj`. The 4 expert models were trained on clusters from [crumb/Wizard-EvolInstruct70k-k4](https://huggingface.co/datasets/crumb/Wizard-EvolInstruct70k-k4) then their trained MLP weights were taken and transplanted in a model initialized from OpenLlama-3b with 4 switchtransformer experts. The routers are not trained in this version of the model and are randomly initialized.

 language:
 - en
 tags:
 - switchtransformer
+- llama
+- MoE
 ---
 This is the very first testing switchllama model from MoLora2, starting from OpenLlama-3b-v2 and adding 4 experts in the MLP blocks of the model. The experts were trained with QLora and merged properly (in 4bit) after individually training adapters on `gate_proj, up_proj, down_proj`. The 4 expert models were trained on clusters from [crumb/Wizard-EvolInstruct70k-k4](https://huggingface.co/datasets/crumb/Wizard-EvolInstruct70k-k4) then their trained MLP weights were taken and transplanted in a model initialized from OpenLlama-3b with 4 switchtransformer experts. The routers are not trained in this version of the model and are randomly initialized.