Ahmadzei's picture
added 3 more tables for large emb model
5fa1a76
GPU
When you train bigger models you have essentially three options:
bigger GPUs
more GPUs
more CPU and NVMe (offloaded to by DeepSpeed-Infinity)
Let's start at the case where you have a single GPU.