chronopt-research
/

vietnamese-gpt2-base

Text Generation

text-generation-inference

Model card Files Files and versions

Metrics Training metrics Community

duongttr commited on Aug 9, 2023

Commit

aa2cd99

·

1 Parent(s): 32efea7

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -47,7 +47,7 @@ The dataset consists of:
 You can find out the combined version here: [duongttr/vi-dataset-for-pretrain](https://huggingface.co/datasets/duongttr/vi-dataset-for-pretrain)
 ## Hyperparamters & Results
-We trained the model ~100k steps, with `lr=1e-4`, `bs=1920`, `optimizer=adamw` on TPU-VM-3.8 from [TRC Program](https://sites.research.google/trc/about/). The training costs around **2.5 days**.
 |Model|Eval Loss|Eval Perplexity|
 |---|---|---|
 |**gpt2-base**|**3.939**|**51.35**|

 You can find out the combined version here: [duongttr/vi-dataset-for-pretrain](https://huggingface.co/datasets/duongttr/vi-dataset-for-pretrain)
 ## Hyperparamters & Results
+We trained the model ~100k steps, with `lr=1e-4`, `bs=2560` (`single_batch_size=32` * `num_core=8` * `grad_cum=10`), `optimizer=adamw` on TPU-VM-3.8 from [TRC Program](https://sites.research.google/trc/about/). The training costs around **1 day**.
 |Model|Eval Loss|Eval Perplexity|
 |---|---|---|
 |**gpt2-base**|**3.939**|**51.35**|