Spaces:
Sleeping
Sleeping
r""" | |
Module ``torch.distributed.launch``. | |
``torch.distributed.launch`` is a module that spawns up multiple distributed | |
training processes on each of the training nodes. | |
.. warning:: | |
This module is going to be deprecated in favor of :ref:`torchrun <launcher-api>`. | |
The utility can be used for single-node distributed training, in which one or | |
more processes per node will be spawned. The utility can be used for either | |
CPU training or GPU training. If the utility is used for GPU training, | |
each distributed process will be operating on a single GPU. This can achieve | |
well-improved single-node training performance. It can also be used in | |
multi-node distributed training, by spawning up multiple processes on each node | |
for well-improved multi-node distributed training performance as well. | |
This will especially be beneficial for systems with multiple Infiniband | |
interfaces that have direct-GPU support, since all of them can be utilized for | |
aggregated communication bandwidth. | |
In both cases of single-node distributed training or multi-node distributed | |
training, this utility will launch the given number of processes per node | |
(``--nproc-per-node``). If used for GPU training, this number needs to be less | |
or equal to the number of GPUs on the current system (``nproc_per_node``), | |
and each process will be operating on a single GPU from *GPU 0 to | |
GPU (nproc_per_node - 1)*. | |
**How to use this module:** | |
1. Single-Node multi-process distributed training | |
:: | |
python -m torch.distributed.launch --nproc-per-node=NUM_GPUS_YOU_HAVE | |
YOUR_TRAINING_SCRIPT.py (--arg1 --arg2 --arg3 and all other | |
arguments of your training script) | |
2. Multi-Node multi-process distributed training: (e.g. two nodes) | |
Node 1: *(IP: 192.168.1.1, and has a free port: 1234)* | |
:: | |
python -m torch.distributed.launch --nproc-per-node=NUM_GPUS_YOU_HAVE | |
--nnodes=2 --node-rank=0 --master-addr="192.168.1.1" | |
--master-port=1234 YOUR_TRAINING_SCRIPT.py (--arg1 --arg2 --arg3 | |
and all other arguments of your training script) | |
Node 2: | |
:: | |
python -m torch.distributed.launch --nproc-per-node=NUM_GPUS_YOU_HAVE | |
--nnodes=2 --node-rank=1 --master-addr="192.168.1.1" | |
--master-port=1234 YOUR_TRAINING_SCRIPT.py (--arg1 --arg2 --arg3 | |
and all other arguments of your training script) | |
3. To look up what optional arguments this module offers: | |
:: | |
python -m torch.distributed.launch --help | |
**Important Notices:** | |
1. This utility and multi-process distributed (single-node or | |
multi-node) GPU training currently only achieves the best performance using | |
the NCCL distributed backend. Thus NCCL backend is the recommended backend to | |
use for GPU training. | |
2. In your training program, you must parse the command-line argument: | |
``--local-rank=LOCAL_PROCESS_RANK``, which will be provided by this module. | |
If your training program uses GPUs, you should ensure that your code only | |
runs on the GPU device of LOCAL_PROCESS_RANK. This can be done by: | |
Parsing the local_rank argument | |
:: | |
>>> # xdoctest: +SKIP | |
>>> import argparse | |
>>> parser = argparse.ArgumentParser() | |
>>> parser.add_argument("--local-rank", type=int) | |
>>> args = parser.parse_args() | |
Set your device to local rank using either | |
:: | |
>>> torch.cuda.set_device(args.local_rank) # before your code runs | |
or | |
:: | |
>>> with torch.cuda.device(args.local_rank): | |
>>> # your code to run | |
>>> ... | |
3. In your training program, you are supposed to call the following function | |
at the beginning to start the distributed backend. It is strongly recommended | |
that ``init_method=env://``. Other init methods (e.g. ``tcp://``) may work, | |
but ``env://`` is the one that is officially supported by this module. | |
:: | |
>>> torch.distributed.init_process_group(backend='YOUR BACKEND', | |
>>> init_method='env://') | |
4. In your training program, you can either use regular distributed functions | |
or use :func:`torch.nn.parallel.DistributedDataParallel` module. If your | |
training program uses GPUs for training and you would like to use | |
:func:`torch.nn.parallel.DistributedDataParallel` module, | |
here is how to configure it. | |
:: | |
>>> model = torch.nn.parallel.DistributedDataParallel(model, | |
>>> device_ids=[args.local_rank], | |
>>> output_device=args.local_rank) | |
Please ensure that ``device_ids`` argument is set to be the only GPU device id | |
that your code will be operating on. This is generally the local rank of the | |
process. In other words, the ``device_ids`` needs to be ``[args.local_rank]``, | |
and ``output_device`` needs to be ``args.local_rank`` in order to use this | |
utility | |
5. Another way to pass ``local_rank`` to the subprocesses via environment variable | |
``LOCAL_RANK``. This behavior is enabled when you launch the script with | |
``--use-env=True``. You must adjust the subprocess example above to replace | |
``args.local_rank`` with ``os.environ['LOCAL_RANK']``; the launcher | |
will not pass ``--local-rank`` when you specify this flag. | |
.. warning:: | |
``local_rank`` is NOT globally unique: it is only unique per process | |
on a machine. Thus, don't use it to decide if you should, e.g., | |
write to a networked filesystem. See | |
https://github.com/pytorch/pytorch/issues/12042 for an example of | |
how things can go wrong if you don't do this correctly. | |
""" | |
import logging | |
import warnings | |
from torch.distributed.run import get_args_parser, run | |
logger = logging.getLogger(__name__) | |
def parse_args(args): | |
parser = get_args_parser() | |
parser.add_argument( | |
"--use-env", | |
"--use_env", | |
default=False, | |
action="store_true", | |
help="Use environment variable to pass " | |
"'local rank'. For legacy reasons, the default value is False. " | |
"If set to True, the script will not pass " | |
"--local-rank as argument, and will instead set LOCAL_RANK.", | |
) | |
return parser.parse_args(args) | |
def launch(args): | |
if args.no_python and not args.use_env: | |
raise ValueError( | |
"When using the '--no-python' flag," | |
" you must also set the '--use-env' flag." | |
) | |
run(args) | |
def main(args=None): | |
warnings.warn( | |
"The module torch.distributed.launch is deprecated\n" | |
"and will be removed in future. Use torchrun.\n" | |
"Note that --use-env is set by default in torchrun.\n" | |
"If your script expects `--local-rank` argument to be set, please\n" | |
"change it to read from `os.environ['LOCAL_RANK']` instead. See \n" | |
"https://pytorch.org/docs/stable/distributed.html#launch-utility for \n" | |
"further instructions\n", | |
FutureWarning, | |
) | |
args = parse_args(args) | |
launch(args) | |
if __name__ == "__main__": | |
main() | |