Even worse, if you are using torch.distributed to launch a distributed training, each process will load the pretrained model and store these two copies in RAM. |
Even worse, if you are using torch.distributed to launch a distributed training, each process will load the pretrained model and store these two copies in RAM. |