Ahmadzei's picture
added 3 more tables for large emb model
5fa1a76
raw
history blame contribute delete
620 Bytes
Transformers and DeepSpeed provide two of the same schedulers:
WarmupLR is the same as --lr_scheduler_type constant_with_warmup in Transformers
WarmupDecayLR is the same as --lr_scheduler_type linear in Transformers (this is the default scheduler used in Transformers)
If you don't configure the scheduler in the config, the [Trainer] automatically selects WarmupDecayLR and either uses the supplied values or the default values for the following parameters from the command line: warmup_min_lr, warmup_max_lr, warmup_num_steps, total_num_steps (automatically calculated during run time if max_steps is not provided).