Update README.md
Browse files
README.md
CHANGED
@@ -42,6 +42,7 @@ Tokenized String: This Ġis Ġa Ġgame
|
|
42 |
### Recommendations
|
43 |
|
44 |
When tokenizing Greek, Greek tokens may appear as gibberish, but actually this does not impact the downstream model pretraining.
|
|
|
45 |
(An improved version of this tokenizer, without the gibberish Greek tokens can be found here: gsar78/Greek_Tokenizer)
|
46 |
|
47 |
Can be used a good start for pretraining a GPT-based model or any other model using BPE.
|
|
|
42 |
### Recommendations
|
43 |
|
44 |
When tokenizing Greek, Greek tokens may appear as gibberish, but actually this does not impact the downstream model pretraining.
|
45 |
+
|
46 |
(An improved version of this tokenizer, without the gibberish Greek tokens can be found here: gsar78/Greek_Tokenizer)
|
47 |
|
48 |
Can be used a good start for pretraining a GPT-based model or any other model using BPE.
|