crumb commited on
Commit
f3df309
·
1 Parent(s): 7b7d647

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -4
README.md CHANGED
@@ -4,11 +4,9 @@ datasets:
4
  language:
5
  - en
6
  tags:
7
- - llama
8
- - openllama
9
  - switchtransformer
10
- - moe
11
- - lora
12
  ---
13
  This is the very first testing switchllama model from MoLora2, starting from OpenLlama-3b-v2 and adding 4 experts in the MLP blocks of the model. The experts were trained with QLora and merged properly (in 4bit) after individually training adapters on `gate_proj, up_proj, down_proj`. The 4 expert models were trained on clusters from [crumb/Wizard-EvolInstruct70k-k4](https://huggingface.co/datasets/crumb/Wizard-EvolInstruct70k-k4) then their trained MLP weights were taken and transplanted in a model initialized from OpenLlama-3b with 4 switchtransformer experts. The routers are not trained in this version of the model and are randomly initialized.
14
 
 
4
  language:
5
  - en
6
  tags:
 
 
7
  - switchtransformer
8
+ - llama
9
+ - MoE
10
  ---
11
  This is the very first testing switchllama model from MoLora2, starting from OpenLlama-3b-v2 and adding 4 experts in the MLP blocks of the model. The experts were trained with QLora and merged properly (in 4bit) after individually training adapters on `gate_proj, up_proj, down_proj`. The 4 expert models were trained on clusters from [crumb/Wizard-EvolInstruct70k-k4](https://huggingface.co/datasets/crumb/Wizard-EvolInstruct70k-k4) then their trained MLP weights were taken and transplanted in a model initialized from OpenLlama-3b with 4 switchtransformer experts. The routers are not trained in this version of the model and are randomly initialized.
12