File size: 3,823 Bytes
c134b71 3bab858 c134b71 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 |
---
tags:
- generated_from_trainer
datasets:
- Graphcore/wikipedia-bert-128
- Graphcore/wikipedia-bert-512
model-index:
- name: Graphcore/bert-base-uncased
results: []
---
# Graphcore/bert-base-uncased
This model is a pre-trained BERT-Base trained in two phases on the [Graphcore/wikipedia-bert-128](https://huggingface.co/datasets/Graphcore/wikipedia-bert-128) and [Graphcore/wikipedia-bert-512](https://huggingface.co/datasets/Graphcore/wikipedia-bert-512) datasets.
## Model description
Pre-trained BERT Base model trained on Wikipedia data.
## Intended uses & limitations
More information needed
## Training and evaluation data
Trained on wikipedia datasets:
- [Graphcore/wikipedia-bert-128](https://huggingface.co/datasets/Graphcore/wikipedia-bert-128)
- [Graphcore/wikipedia-bert-512](https://huggingface.co/datasets/Graphcore/wikipedia-bert-512)
## Training procedure
Trained MLM and NSP pre-training scheme from [Large Batch Optimization for Deep Learning: Training BERT in 76 minutes](https://arxiv.org/abs/1904.00962).
Trained on 16 Graphcore Mk2 IPUs.
Command lines:
Phase 1:
```
python examples/language-modeling/run_pretraining.py \
--config_name bert-base-uncased \
--tokenizer_name bert-base-uncased \
--do_train \
--logging_steps 5 \
--max_seq_length 128 \
--ipu_config_name Graphcore/bert-base-ipu \
--dataset_name Graphcore/wikipedia-bert-128 \
--max_steps 10500 \
--is_already_preprocessed \
--dataloader_num_workers 64 \
--dataloader_mode async_rebatched \
--lamb \
--lamb_no_bias_correction \
--per_device_train_batch_size 32 \
--gradient_accumulation_steps 512 \
--learning_rate 0.006 \
--lr_scheduler_type linear \
--loss_scaling 16384 \
--weight_decay 0.01 \
--warmup_ratio 0.28 \
--save_steps 100 \
--config_overrides "layer_norm_eps=0.001" \
--ipu_config_overrides "device_iterations=1" \
--output_dir output-pretrain-bert-base-phase1
```
Phase 2:
```
python examples/language-modeling/run_pretraining.py \
--config_name bert-base-uncased \
--tokenizer_name bert-base-uncased \
--model_name_or_path ./output-pretrain-bert-base-phase1 \
--do_train \
--logging_steps 5 \
--max_seq_length 512 \
--ipu_config_name Graphcore/bert-base-ipu \
--dataset_name Graphcore/wikipedia-bert-512 \
--max_steps 2038 \
--is_already_preprocessed \
--dataloader_num_workers 128 \
--dataloader_mode async_rebatched \
--lamb \
--lamb_no_bias_correction \
--per_device_train_batch_size 8 \
--gradient_accumulation_steps 512 \
--learning_rate 0.002828 \
--lr_scheduler_type linear \
--loss_scaling 128.0 \
--weight_decay 0.01 \
--warmup_ratio 0.128 \
--config_overrides "layer_norm_eps=0.001" \
--ipu_config_overrides "device_iterations=1,embedding_serialization_factor=2,matmul_proportion=0.22" \
--output_dir output-pretrain-bert-base-phase2
```
### Training hyperparameters
The following hyperparameters were used during phase 1 training:
- learning_rate: 0.006
- train_batch_size: 32
- eval_batch_size: 8
- seed: 42
- distributed_type: IPU
- gradient_accumulation_steps: 512
- total_train_batch_size: 65536
- total_eval_batch_size: 128
- optimizer: LAMB
- lr_scheduler_type: linear
- lr_scheduler_warmup_ratio: 0.28
- training_steps: 10500
- training precision: Mixed Precision
The following hyperparameters were used during phase 2 training:
- learning_rate: 0.002828
- train_batch_size: 8
- eval_batch_size: 8
- seed: 42
- distributed_type: IPU
- gradient_accumulation_steps: 512
- total_train_batch_size: 16384
- total_eval_batch_size: 128
- optimizer: LAMB
- lr_scheduler_type: linear
- lr_scheduler_warmup_ratio: 0.128
- training_steps: 2038
- training precision: Mixed Precision
### Framework versions
- Transformers 4.17.0.dev0
- Pytorch 1.10.0+cpu
- Datasets 1.18.3.dev0
- Tokenizers 0.10.3
|