Update README.md
Browse files
README.md
CHANGED
@@ -47,7 +47,7 @@ The dataset consists of:
|
|
47 |
You can find out the combined version here: [duongttr/vi-dataset-for-pretrain](https://huggingface.co/datasets/duongttr/vi-dataset-for-pretrain)
|
48 |
|
49 |
## Hyperparamters & Results
|
50 |
-
We trained the model ~100k steps, with `lr=1e-4`, `bs=
|
51 |
|Model|Eval Loss|Eval Perplexity|
|
52 |
|---|---|---|
|
53 |
|**gpt2-base**|**3.939**|**51.35**|
|
|
|
47 |
You can find out the combined version here: [duongttr/vi-dataset-for-pretrain](https://huggingface.co/datasets/duongttr/vi-dataset-for-pretrain)
|
48 |
|
49 |
## Hyperparamters & Results
|
50 |
+
We trained the model ~100k steps, with `lr=1e-4`, `bs=2560` (`single_batch_size=32` * `num_core=8` * `grad_cum=10`), `optimizer=adamw` on TPU-VM-3.8 from [TRC Program](https://sites.research.google/trc/about/). The training costs around **1 day**.
|
51 |
|Model|Eval Loss|Eval Perplexity|
|
52 |
|---|---|---|
|
53 |
|**gpt2-base**|**3.939**|**51.35**|
|