crumb commited on
Commit
7b7d647
·
1 Parent(s): f1ea0c8

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +3 -1
README.md CHANGED
@@ -10,4 +10,6 @@ tags:
10
  - moe
11
  - lora
12
  ---
13
- This is the very first testing switchllama model from MoLora2, starting from OpenLlama-3b-v2 and adding 4 experts in the MLP blocks of the model. The experts were trained with QLora and merged properly (in 4bit) after individually training adapters on `gate_proj, up_proj, down_proj`. The 4 expert models were trained on clusters from [crumb/Wizard-EvolInstruct70k-k4](https://huggingface.co/datasets/crumb/Wizard-EvolInstruct70k-k4) then their trained MLP weights were taken and transplanted in a model initialized from OpenLlama-3b with 4 switchtransformer experts. The routers are not trained in this version of the model and are randomly initialized.
 
 
 
10
  - moe
11
  - lora
12
  ---
13
+ This is the very first testing switchllama model from MoLora2, starting from OpenLlama-3b-v2 and adding 4 experts in the MLP blocks of the model. The experts were trained with QLora and merged properly (in 4bit) after individually training adapters on `gate_proj, up_proj, down_proj`. The 4 expert models were trained on clusters from [crumb/Wizard-EvolInstruct70k-k4](https://huggingface.co/datasets/crumb/Wizard-EvolInstruct70k-k4) then their trained MLP weights were taken and transplanted in a model initialized from OpenLlama-3b with 4 switchtransformer experts. The routers are not trained in this version of the model and are randomly initialized.
14
+
15
+ Modeling code is not included until this proof-of-concept is entirely trained.