Ahmadzei's picture
added 3 more tables for large emb model
5fa1a76
Even worse, if you are using torch.distributed to launch a distributed training, each process will load the pretrained model and store these two copies in RAM.