tokenizer note
Browse files
README.md
CHANGED
|
@@ -11,9 +11,12 @@ convert_v2.py
|
|
| 11 |
|
| 12 |
Training Notes:
|
| 13 |
```
|
| 14 |
-
# dbrx trains like a much smaller model (~7B)
|
| 15 |
# start with this as reference point and move up or down based on eval/train loss
|
| 16 |
learning_rate = 1.5e-5
|
|
|
|
|
|
|
|
|
|
| 17 |
```
|
| 18 |
|
| 19 |
Known Issues:
|
|
|
|
| 11 |
|
| 12 |
Training Notes:
|
| 13 |
```
|
| 14 |
+
# 1. dbrx trains like a much smaller model (~7B)
|
| 15 |
# start with this as reference point and move up or down based on eval/train loss
|
| 16 |
learning_rate = 1.5e-5
|
| 17 |
+
|
| 18 |
+
# 2. due to BPE (tiktoken) nature, tokenizer expansion/resize is not very friendly to training
|
| 19 |
+
# use text based special tokens if you need/use extra tokens to avoid bad train/eval losses
|
| 20 |
```
|
| 21 |
|
| 22 |
Known Issues:
|