Update README.md
Browse files
README.md
CHANGED
@@ -4,11 +4,9 @@ datasets:
|
|
4 |
language:
|
5 |
- en
|
6 |
tags:
|
7 |
-
- llama
|
8 |
-
- openllama
|
9 |
- switchtransformer
|
10 |
-
-
|
11 |
-
-
|
12 |
---
|
13 |
This is the very first testing switchllama model from MoLora2, starting from OpenLlama-3b-v2 and adding 4 experts in the MLP blocks of the model. The experts were trained with QLora and merged properly (in 4bit) after individually training adapters on `gate_proj, up_proj, down_proj`. The 4 expert models were trained on clusters from [crumb/Wizard-EvolInstruct70k-k4](https://huggingface.co/datasets/crumb/Wizard-EvolInstruct70k-k4) then their trained MLP weights were taken and transplanted in a model initialized from OpenLlama-3b with 4 switchtransformer experts. The routers are not trained in this version of the model and are randomly initialized.
|
14 |
|
|
|
4 |
language:
|
5 |
- en
|
6 |
tags:
|
|
|
|
|
7 |
- switchtransformer
|
8 |
+
- llama
|
9 |
+
- MoE
|
10 |
---
|
11 |
This is the very first testing switchllama model from MoLora2, starting from OpenLlama-3b-v2 and adding 4 experts in the MLP blocks of the model. The experts were trained with QLora and merged properly (in 4bit) after individually training adapters on `gate_proj, up_proj, down_proj`. The 4 expert models were trained on clusters from [crumb/Wizard-EvolInstruct70k-k4](https://huggingface.co/datasets/crumb/Wizard-EvolInstruct70k-k4) then their trained MLP weights were taken and transplanted in a model initialized from OpenLlama-3b with 4 switchtransformer experts. The routers are not trained in this version of the model and are randomly initialized.
|
12 |
|