File size: 106 Bytes
5fa1a76
1
Unlike DistributedDataParallel (DDP), FSDP reduces memory-usage because a model is replicated on each GPU.