Spaces:

nanotron
/

ultrascale-playbook

Running

App Files Files Community

119

Questions about pipeline parallelism

#103

by ink0215 - opened Mar 12

Discussion

ink0215

Mar 12

My question is related to the description of the activation memory for pipeline parallelism. As the following suggested:

Now that each GPU only hold part layers of the whole model, the activation should also distributes among them, right? So from my perspective, the pipeline parallelism could save activation memory for each GPU rank, during training.

Am I right?

huangyi1979

Mar 20

Same question here

NingXu24

Apr 2

Same question here

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment