crumb commited on
Commit
f1ea0c8
·
1 Parent(s): da18cbb

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +13 -0
README.md ADDED
@@ -0,0 +1,13 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ datasets:
3
+ - crumb/Wizard-EvolInstruct70k-k4
4
+ language:
5
+ - en
6
+ tags:
7
+ - llama
8
+ - openllama
9
+ - switchtransformer
10
+ - moe
11
+ - lora
12
+ ---
13
+ This is the very first testing switchllama model from MoLora2, starting from OpenLlama-3b-v2 and adding 4 experts in the MLP blocks of the model. The experts were trained with QLora and merged properly (in 4bit) after individually training adapters on `gate_proj, up_proj, down_proj`. The 4 expert models were trained on clusters from [crumb/Wizard-EvolInstruct70k-k4](https://huggingface.co/datasets/crumb/Wizard-EvolInstruct70k-k4) then their trained MLP weights were taken and transplanted in a model initialized from OpenLlama-3b with 4 switchtransformer experts. The routers are not trained in this version of the model and are randomly initialized.