| For example, to estimate the memory requirements for the bigscience/T0_3B model on a single GPU: | |
| $ python -c 'from transformers import AutoModel; \ | |
| from deepspeed.runtime.zero.stage3 import estimate_zero3_model_states_mem_needs_all_live; \ | |
| model = AutoModel.from_pretrained("bigscience/T0_3B"); \ | |
| estimate_zero3_model_states_mem_needs_all_live(model, num_gpus_per_node=1, num_nodes=1)' | |
| [] | |
| Estimated memory needed for params, optim states and gradients for a: | |
| HW: Setup with 1 node, 1 GPU per node. |