Commit
·
1a735fe
1
Parent(s):
f80e757
Update README.md
Browse files
README.md
CHANGED
@@ -10,7 +10,7 @@ license: apache-2.0
|
|
10 |
# Polyglot-Ko-1.3B
|
11 |
|
12 |
## Model Description
|
13 |
-
Polyglot-Ko is a Korean autoregressive language
|
14 |
|
15 |
| Hyperparameter | Value |
|
16 |
|----------------------|----------------------------------------------------------------------------------------------------------------------------------------|
|
@@ -21,13 +21,13 @@ Polyglot-Ko is a Korean autoregressive language model made by EleutherAI polyglo
|
|
21 |
| \\(n_{heads}\\) | 16 |
|
22 |
| \\(d_{head}\\) | 128 |
|
23 |
| \\(n_{ctx}\\) | 2048 |
|
24 |
-
| \\(n_{vocab}\\) | 30,
|
25 |
| Positional Encoding | [Rotary Position Embedding (RoPE)](https://arxiv.org/abs/2104.09864) |
|
26 |
| RoPE Dimensions | [64](https://github.com/kingoflolz/mesh-transformer-jax/blob/f2aa66e0925de6593dcbb70e72399b97b4130482/mesh_transformer/layers.py#L223) |
|
27 |
|
28 |
The model consists of 24 transformer layers with a model dimension of 2048, and a feedforward dimension of 8192. The model
|
29 |
dimension is split into 16 heads, each with a dimension of 128. Rotary Position Embedding (RoPE) is applied to 64
|
30 |
-
dimensions of each head. The model is trained with a tokenization vocabulary of
|
31 |
|
32 |
## Training data
|
33 |
|
@@ -60,19 +60,19 @@ General training algorithms for pretrained language model have many hazards that
|
|
60 |
* `<|tell|>` : phone number
|
61 |
|
62 |
### Limitations and Biases
|
63 |
-
|
64 |
The core functionality of Polyglot-Ko is taking a string of text and predicting the next token. While language models are widely used for tasks other than this, there are a lot of unknowns with this work. When prompting Polyglot-Ko it is important to remember that the statistically most likely next token is often not the token that produces the most "accurate" text. Never depend upon Polyglot-Ko to produce factually accurate output.Depending upon use case Polyglot-Ko may produce socially unacceptable text.
|
65 |
-
|
66 |
As with all language models, it is hard to predict in advance how Polyglot-Ko will respond to particular prompts and offensive content may occur without warning. We recommend having a human curate or filter the outputs before releasing them, both to censor undesirable content and to improve the quality of the results.
|
67 |
|
68 |
### Legal Restrictions
|
69 |
Since there are laws in many countries related to data collection, we will collect data with due regard to the laws of those countries.
|
70 |
Additionally, we plan to use dataset to train our models, but we do not plan to make the dataset publicly available.
|
71 |
|
|
|
72 |
## Evaluation results
|
73 |
-
We used the [KOBEST dataset](https://arxiv.org/abs/2204.04541), which consists of five Korean downstream tasks for
|
74 |
-
We added
|
75 |
-
|
|
|
76 |
|
77 |
```console
|
78 |
python main.py \
|
|
|
10 |
# Polyglot-Ko-1.3B
|
11 |
|
12 |
## Model Description
|
13 |
+
Polyglot-Ko is a series of large-scale Korean autoregressive language models made by the EleutherAI polyglot team. Polyglot-Ko-1.3B is the first and the smallest one.
|
14 |
|
15 |
| Hyperparameter | Value |
|
16 |
|----------------------|----------------------------------------------------------------------------------------------------------------------------------------|
|
|
|
21 |
| \\(n_{heads}\\) | 16 |
|
22 |
| \\(d_{head}\\) | 128 |
|
23 |
| \\(n_{ctx}\\) | 2048 |
|
24 |
+
| \\(n_{vocab}\\) | 30,003 / 30,080 |
|
25 |
| Positional Encoding | [Rotary Position Embedding (RoPE)](https://arxiv.org/abs/2104.09864) |
|
26 |
| RoPE Dimensions | [64](https://github.com/kingoflolz/mesh-transformer-jax/blob/f2aa66e0925de6593dcbb70e72399b97b4130482/mesh_transformer/layers.py#L223) |
|
27 |
|
28 |
The model consists of 24 transformer layers with a model dimension of 2048, and a feedforward dimension of 8192. The model
|
29 |
dimension is split into 16 heads, each with a dimension of 128. Rotary Position Embedding (RoPE) is applied to 64
|
30 |
+
dimensions of each head. The model is trained with a tokenization vocabulary of 30003.
|
31 |
|
32 |
## Training data
|
33 |
|
|
|
60 |
* `<|tell|>` : phone number
|
61 |
|
62 |
### Limitations and Biases
|
|
|
63 |
The core functionality of Polyglot-Ko is taking a string of text and predicting the next token. While language models are widely used for tasks other than this, there are a lot of unknowns with this work. When prompting Polyglot-Ko it is important to remember that the statistically most likely next token is often not the token that produces the most "accurate" text. Never depend upon Polyglot-Ko to produce factually accurate output.Depending upon use case Polyglot-Ko may produce socially unacceptable text.
|
|
|
64 |
As with all language models, it is hard to predict in advance how Polyglot-Ko will respond to particular prompts and offensive content may occur without warning. We recommend having a human curate or filter the outputs before releasing them, both to censor undesirable content and to improve the quality of the results.
|
65 |
|
66 |
### Legal Restrictions
|
67 |
Since there are laws in many countries related to data collection, we will collect data with due regard to the laws of those countries.
|
68 |
Additionally, we plan to use dataset to train our models, but we do not plan to make the dataset publicly available.
|
69 |
|
70 |
+
|
71 |
## Evaluation results
|
72 |
+
We used the [KOBEST dataset](https://arxiv.org/abs/2204.04541), which consists of five Korean downstream tasks, for evaluation.
|
73 |
+
We added those tasks to [lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness) and utilized prompt templates described in the paper.
|
74 |
+
We evaluted our model as well as two other Korean language models, i.e., skt/ko-gpt-trinity-1.2B-v0.5 and kakaobrain/kogpt for comparison.
|
75 |
+
The following tables show the results when the number of few-shot examples differ. You can reproduce these results using [polyglot branch of lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness/tree/polyglot) and the following scripts.
|
76 |
|
77 |
```console
|
78 |
python main.py \
|