crumb
/

test-00-switchllama-i3b-f10b-e4-init

Text Generation

Model card Files Files and versions Community

crumb commited on Sep 7, 2023

Commit

f1ea0c8

·

1 Parent(s): da18cbb

Create README.md

Files changed (1) hide show

README.md +13 -0

README.md ADDED Viewed

	@@ -0,0 +1,13 @@

+---
+datasets:
+- crumb/Wizard-EvolInstruct70k-k4
+language:
+- en
+tags:
+- llama
+- openllama
+- switchtransformer
+- moe
+- lora
+---
+This is the very first testing switchllama model from MoLora2, starting from OpenLlama-3b-v2 and adding 4 experts in the MLP blocks of the model. The experts were trained with QLora and merged properly (in 4bit) after individually training adapters on `gate_proj, up_proj, down_proj`. The 4 expert models were trained on clusters from [crumb/Wizard-EvolInstruct70k-k4](https://huggingface.co/datasets/crumb/Wizard-EvolInstruct70k-k4) then their trained MLP weights were taken and transplanted in a model initialized from OpenLlama-3b with 4 switchtransformer experts. The routers are not trained in this version of the model and are randomly initialized.