Update README.md
Browse files
README.md
CHANGED
|
@@ -3,4 +3,20 @@ datasets:
|
|
| 3 |
- nilq/babylm-10M
|
| 4 |
language:
|
| 5 |
- en
|
| 6 |
-
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 3 |
- nilq/babylm-10M
|
| 4 |
language:
|
| 5 |
- en
|
| 6 |
+
---
|
| 7 |
+
|
| 8 |
+
This autoregressive model belongs to a series of rather small language models trained on the [BabyLM](https://babylm.github) data:
|
| 9 |
+
- the [baby_llama](https://huggingface.co/bbunzeck/baby_llama) model has few parameters and was trained on a small data set (10M tokens)
|
| 10 |
+
- the [**t**eenie_llama](https://huggingface.co/bbunzeck/teenie_llama) model has the same number of parameters but was trained on more **t**okens of text (100M)
|
| 11 |
+
- the [**w**eenie_llama](https://huggingface.co/bbunzeck/weenie_llama) model was trained on the small data set, but has more parameters/**w**eights
|
| 12 |
+
- the [**tw**eenie_llama](https://huggingface.co/bbunzeck/tweenie_llama) model features both -- more **t**okens (the larger data set) and more **w**eights (*viz.* parameters)
|
| 13 |
+
|
| 14 |
+
|
| 15 |
+
| | baby_llama | teenie_llama | weenie_llama | tweenie_llama |
|
| 16 |
+
|-----------------|-----------|-------------|-------------|--------------|
|
| 17 |
+
| Parameters | 2.97M | 2.97M | 11.44M | 11.44M |
|
| 18 |
+
| hidden layers | 8 | 8 | 16 | 16 |
|
| 19 |
+
| Attention heads | 8 | 8 | 16 | 16 |
|
| 20 |
+
| Embedding size | 128 | 128 | 256 | 256 |
|
| 21 |
+
| Context size | 128 | 128 | 256 | 256 |
|
| 22 |
+
| Vocab size | 16k | 16k | 16k | 16k |
|