Ontocord.AI commited on
Commit
7442181
·
1 Parent(s): 0bcf6f8

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +13 -1
README.md CHANGED
@@ -6,5 +6,17 @@ colorTo: purple
6
  sdk: static
7
  pinned: false
8
  ---
 
 
9
 
10
- Edit this `README.md` markdown file to author your organization card 🔥
 
 
 
 
 
 
 
 
 
 
 
6
  sdk: static
7
  pinned: false
8
  ---
9
+ # Multi-Domain Expert Layers (MDEL) Training:
10
+ ## How to increase knowledge without breaking the bank?
11
 
12
+ Volunteers from:
13
+ Ontocord.AI, Bedrock.AI, TurkuNLP, ETH, Redmond.AI, Incite, MICS CentraleSupelec, Centro de Excelência em Inteligência Artificial, VietAI, Technion - Israel Institute of Technology, Nous Research, University of Western Australia, LAION.AI
14
+
15
+
16
+ Open sourcing AI models can lead to increased innovation, accessibility, transparency, and community building. However we need a mechanism to train more capable models in an efficient and modular way.
17
+
18
+ The proposed method that we call Multi-Domain Expert Layers (MDEL) training for open source language models involves branching from a base model, training each branch independently on a specific domain for specific layers, and merging the trained models at the end. Additionally, the specific layers are kept as experts, with a classifier used as a router to activate the experts during inference. This approach makes it possible to easily increase expertise of a model, to independently train more "adapters", and to reuse previously trained experts and models without retraining, resulting in a modular and efficient system.
19
+
20
+ In this effort, we seek international labs and open source aligned researchers and companies in various countries to each train a set of domain experts of their choosing, thereby enabling international participation and knowledge sharing. This will also result in lower costs for training and a lower environmental impact due to reuse and lower energy usage. Currently we have volunteers from four continents and are looking for more.
21
+
22
+ We will be using a varient of the c-BTM (https://arxiv.org/pdf/2303.14177v1.pdf) method and will be focusing on models around 20B parameters.