Update README.md
Browse files
README.md
CHANGED
|
@@ -9,15 +9,14 @@ Special thanks to https://huggingface.co/fahadh4ilyas
|
|
| 9 |
convert_v2.py
|
| 10 |
```
|
| 11 |
|
| 12 |
-
Training Notes:
|
|
|
|
|
|
|
| 13 |
```
|
| 14 |
-
# 1. dbrx trains like a much smaller model (~7B)
|
| 15 |
# start with this as reference point and move up or down based on eval/train loss
|
| 16 |
learning_rate = 1.5e-5
|
| 17 |
-
|
| 18 |
-
# 2. due to BPE (tiktoken) nature, tokenizer expansion/resize is not very friendly to training
|
| 19 |
-
# use text based special tokens if you need/use extra tokens to avoid bad train/eval losses
|
| 20 |
```
|
|
|
|
| 21 |
|
| 22 |
Known Issues:
|
| 23 |
|
|
|
|
| 9 |
convert_v2.py
|
| 10 |
```
|
| 11 |
|
| 12 |
+
Training Notes/Observations:
|
| 13 |
+
|
| 14 |
+
1. dbrx trains like a much smaller model (~7B)
|
| 15 |
```
|
|
|
|
| 16 |
# start with this as reference point and move up or down based on eval/train loss
|
| 17 |
learning_rate = 1.5e-5
|
|
|
|
|
|
|
|
|
|
| 18 |
```
|
| 19 |
+
2. Due to nature of BPE (tiktoken), tokenizer expansion/resize is not very friendly to training. Use text based special tokens if you need/use extra tokens to avoid bad train/eval losses
|
| 20 |
|
| 21 |
Known Issues:
|
| 22 |
|