Spaces:

Ahmadzei
/

RAG

Runtime error

RAG

File size: 178 Bytes

5fa1a76

By sharding the model parameters, optimizer and gradient states, and even offloading them to the CPU when they're inactive, FSDP can reduce the high cost of large-scale training.