Upload README.md with huggingface_hub
Browse files
README.md
CHANGED
@@ -1,12 +1,12 @@
|
|
1 |
# BriLLM: Brain-inspired Large Language Model
|
2 |
|
3 |
-
Our code: https://github.com/brillm05/BriLLM0.5
|
4 |
-
|
5 |
## Overview
|
6 |
This work introduces the first brain-inspired large language model (BriLLM). This is a non-Transformer, non-GPT, non-traditional machine learning input-output controlled generative language model. The model is based on the Signal Fully-connected flowing (SiFu) definition on the directed graph in terms of the neural network, and has the interpretability of all nodes on the graph of the whole model, instead of the traditional machine learning model that only has limited interpretability at the input and output ends.
|
7 |
|
|
|
|
|
8 |

|
9 |
-
As shown in Figure 1, SiFu model is a graph composed of multiple nodes, which are sparsely activated and utilize tensors to transmit a nominal signal.
|
10 |
Each node (ideally, a layer of neurons) represents a certain concept or word, e.g., a noun, a verb, etc.
|
11 |
Each edge models the relationship between every node pair.
|
12 |
The signal is transmitted by the magnitude of the energy. The energy will be strengthened, i.e., maximized, if it is in the right route. Or, at least, the right path always keeps the maximal energy for the transmitted signal.
|
@@ -14,6 +14,52 @@ Each node is sequentially activated in terms of the maximized energy.
|
|
14 |
Route or path is determined in a competitive way, i.e., the next node will be activated only if the energy can be maximally delivered in this node.
|
15 |
|
16 |
|
|
|
17 |

|
18 |
-
As shown in Figure 2, BriLLM implements SiFu neural network for language modeling.
|
19 |
-
Each token in the vocabulary is modeled as a node, which is defined by a hidden layer of neurons in the neural network.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
# BriLLM: Brain-inspired Large Language Model
|
2 |
|
|
|
|
|
3 |
## Overview
|
4 |
This work introduces the first brain-inspired large language model (BriLLM). This is a non-Transformer, non-GPT, non-traditional machine learning input-output controlled generative language model. The model is based on the Signal Fully-connected flowing (SiFu) definition on the directed graph in terms of the neural network, and has the interpretability of all nodes on the graph of the whole model, instead of the traditional machine learning model that only has limited interpretability at the input and output ends.
|
5 |
|
6 |
+
|
7 |
+
## SiFU Mechanism
|
8 |

|
9 |
+
> As shown in Figure 1, SiFu model is a graph composed of multiple nodes, which are sparsely activated and utilize tensors to transmit a nominal signal.
|
10 |
Each node (ideally, a layer of neurons) represents a certain concept or word, e.g., a noun, a verb, etc.
|
11 |
Each edge models the relationship between every node pair.
|
12 |
The signal is transmitted by the magnitude of the energy. The energy will be strengthened, i.e., maximized, if it is in the right route. Or, at least, the right path always keeps the maximal energy for the transmitted signal.
|
|
|
14 |
Route or path is determined in a competitive way, i.e., the next node will be activated only if the energy can be maximally delivered in this node.
|
15 |
|
16 |
|
17 |
+
## Architecture
|
18 |

|
19 |
+
> As shown in Figure 2, BriLLM implements SiFu neural network for language modeling.
|
20 |
+
Each token in the vocabulary is modeled as a node, which is defined by a hidden layer of neurons in the neural network.
|
21 |
+
|
22 |
+
|
23 |
+
## Training Network
|
24 |
+

|
25 |
+
> To train a sample in BriLLM, every time we build an individual common neural network to perform the regular BP training. This network consists of two parts, in which the front part connects all input nodes (i.e., tokens), then it follows the rear parts which connect all possible paths in order. At last, a softmax layer collects all paths' energy tensors to indicate the right path with a 0-1 ground truth vector. We adopt a cross-entropy loss for training.
|
26 |
+
|
27 |
+
|
28 |
+
## Installation
|
29 |
+
```bash
|
30 |
+
pip install torch
|
31 |
+
```
|
32 |
+
|
33 |
+
|
34 |
+
## Checkpoint
|
35 |
+
[BriLLM0.5](https://huggingface.co/BriLLM/BriLLM0.5)
|
36 |
+
|
37 |
+
|
38 |
+
## Inference
|
39 |
+
```python
|
40 |
+
import json
|
41 |
+
import torch
|
42 |
+
from model import BraLM, Vocab
|
43 |
+
|
44 |
+
with open("./vocab.json") as f:
|
45 |
+
node_dict = json.load(f)
|
46 |
+
vocab = Vocab.from_node_dict(node_dict)
|
47 |
+
|
48 |
+
model = BraLM(hidden_size=32)
|
49 |
+
model.prepare_network(vocab)
|
50 |
+
|
51 |
+
state_dict_0, state_dict_1 = torch.load("model_0.bin", weights_only=True), torch.load("model_1.bin", weights_only=True)
|
52 |
+
merged_state_dict = {**state_dict_0, **state_dict_1}
|
53 |
+
model.load_state_dict(merged_state_dict)
|
54 |
+
model.to_device("cuda:0")
|
55 |
+
|
56 |
+
head = "《罗马》描述了"
|
57 |
+
max_token = 16 - len(head)
|
58 |
+
|
59 |
+
start = [vocab((head[i]+ '->' +head[i+1])) for i in range(len(head)-1)]
|
60 |
+
ret = model.decode(start, vocab, max_token)
|
61 |
+
decode_tuple_list = [vocab.decode(p) for p in ret]
|
62 |
+
decode_sentence = decode_tuple_list[0][0] + "".join([p[-1] for p in decode_tuple_list])
|
63 |
+
|
64 |
+
print(decode_sentence)
|
65 |
+
```
|