Commit
·
c47476f
1
Parent(s):
90e5136
Update README.md
Browse files
README.md
CHANGED
@@ -5,9 +5,9 @@ license: mit
|
|
5 |
|
6 |
The documentation and code is available on Github [alasdairforsythe/tokenmonster](https://github.com/alasdairforsythe/tokenmonster).
|
7 |
|
8 |
-
The
|
9 |
|
10 |
-
**July
|
11 |
|
12 |
Choose a dataset from:
|
13 |
|
@@ -44,4 +44,8 @@ And finally add the version number:
|
|
44 |
|
45 |
Examples:
|
46 |
- `fiction-24000-consistent-v1`
|
47 |
-
- `code-4096-clean-nocapcode-v1`
|
|
|
|
|
|
|
|
|
|
5 |
|
6 |
The documentation and code is available on Github [alasdairforsythe/tokenmonster](https://github.com/alasdairforsythe/tokenmonster).
|
7 |
|
8 |
+
The pretrained vocabularies are all available for download [here](https://huggingface.co/alasdairforsythe/tokenmonster/tree/main/vocabs).
|
9 |
|
10 |
+
**July 11:** TokenMonster v1.1.1 has been released. The "420" prebuilt vocabularies are being released as they are completed, at a rate of around 10 per day.
|
11 |
|
12 |
Choose a dataset from:
|
13 |
|
|
|
44 |
|
45 |
Examples:
|
46 |
- `fiction-24000-consistent-v1`
|
47 |
+
- `code-4096-clean-nocapcode-v1`
|
48 |
+
|
49 |
+
There are two additional vocabularies:
|
50 |
+
- `gpt2`
|
51 |
+
- `llama`
|