Upload README.md
Browse files
README.md
CHANGED
@@ -40,22 +40,22 @@ I use A100 GPU 40GB and COLAB, when trianing.
|
|
40 |
|
41 |
| Model Name | Vocabulary Size | Description |
|
42 |
| --- | --- | --- |
|
43 |
-
| Original Platypus2 |
|
44 |
-
| **Expanded KO-Platypus-ex** |
|
45 |
|
46 |
**Tokenizing "์๋
ํ์ธ์, ์ค๋์ ๋ ์จ๊ฐ ์ข๋ค์."**
|
47 |
|
48 |
| Model | Tokens |
|
49 |
| --- | --- |
|
50 |
-
| Platypus2-7b | `[
|
51 |
-
| KO-Platypus2-7b-ex | `[
|
52 |
|
53 |
**Tokenizing "Platypus: Quick, Cheap, and Powerful Refinement of LLMs"**
|
54 |
|
55 |
| Model | Tokens |
|
56 |
| --- | --- |
|
57 |
-
| Platypus2-7b | `[
|
58 |
-
| KO-Platypus2-7b-ex | `[
|
59 |
|
60 |
# **Model Benchmark**
|
61 |
|
|
|
40 |
|
41 |
| Model Name | Vocabulary Size | Description |
|
42 |
| --- | --- | --- |
|
43 |
+
| Original Platypus2 | 32000 | Sentencepiece BPE |
|
44 |
+
| **Expanded KO-Platypus-ex** | 46336 | Sentencepiece BPE. Added Korean vocab and merges |
|
45 |
|
46 |
**Tokenizing "์๋
ํ์ธ์, ์ค๋์ ๋ ์จ๊ฐ ์ข๋ค์."**
|
47 |
|
48 |
| Model | Tokens |
|
49 |
| --- | --- |
|
50 |
+
| Platypus2-7b | `['โ', '์', '<0xEB>', '<0x85>', '<0x95>', 'ํ', '์ธ', '์', ',', 'โ', '์ค', '<0xEB>', '<0x8A>', '<0x98>', '์', 'โ', '<0xEB>', '<0x82>', '<0xA0>', '์จ', '๊ฐ', 'โ', '<0xEC>', '<0xA2>', '<0x8B>', '<0xEB>', '<0x84>', '<0xA4>', '์', '.']` |
|
51 |
+
| KO-Platypus2-7b-ex | `['โ์๋
', 'ํ์ธ์', ',', 'โ์ค๋์', 'โ๋ ', '์จ๊ฐ', 'โ์ข๋ค์', '.']` |
|
52 |
|
53 |
**Tokenizing "Platypus: Quick, Cheap, and Powerful Refinement of LLMs"**
|
54 |
|
55 |
| Model | Tokens |
|
56 |
| --- | --- |
|
57 |
+
| Platypus2-7b | `['โPlat', 'yp', 'us', ':', 'โQuick', ',', 'โChe', 'ap', ',', 'โand', 'โPower', 'ful', 'โRe', 'fin', 'ement', 'โof', 'โL', 'LM', 's']` |
|
58 |
+
| KO-Platypus2-7b-ex | `[โPlat', 'yp', 'us', ':', 'โQuick', ',', 'โChe', 'ap', ',', 'โand', 'โPower', 'ful', 'โRe', 'fin', 'ement', 'โof', 'โL', 'LM', 's']` |
|
59 |
|
60 |
# **Model Benchmark**
|
61 |
|