Victarry's picture
Add support for DualPipe.
99a6901
|
raw
history blame
4.94 kB

Pipeline Parallelism Emulation

This project provides tools for emulating and visualizing pipeline parallelism strategies used in large language model training.

Overview

Pipeline parallelism is a technique used to train large models by partitioning the model across multiple devices and processing data in a pipelined fashion. This project allows you to:

  • Simulate different pipeline parallelism strategies (1F1B, Interleaved, Zero-Bubble, etc.)
  • Visualize the execution schedule on multiple devices
  • Compare different strategies for efficiency

Features

  • Supported Pipeline Strategies:

    • 1F1B (One-Forward-One-Backward)
    • Interleaved 1F1B
    • Zero-Bubble 1F1B (ZB-1P)
    • 1F1B with computation-communication overlap
    • Interleaved 1F1B with computation-communication overlap
    • DualPipe (Bidirectional pipeline parallelism with full forward-backward overlap)
  • Visualization:

    • Interactive visualization dashboard using Plotly/Dash
  • Configuration:

    • Configurable simulation parameters through Hydra
    • Customizable stage latency and communication costs

Installation

This project uses uv for dependency management.

Setup uv if not installed on your computer:

# On macOS and Linux
curl -LsSf https://astral.sh/uv/install.sh | sh

Usage

Running for 1F1B strategy:

uv run python main.py strategy=1f1b num_devices=4 num_stages=4 num_batches=8

1f1b

Running for interleaved strategy:

uv run python main.py strategy=interleave num_devices=4 num_stages=8 num_batches=8

interleave

Running for ZB-1P strategy:

uv run python main.py strategy=zb1p num_devices=4 num_stages=4 num_batches=8

zb1p

Running for DualPipe strategy:

uv run python main.py strategy=dualpipe num_devices=8 num_stages=8 num_batches=20

dualpipe

Running for 1F1B-batch-overlap strategy:

uv run python main.py strategy=1f1b_overlap num_devices=4 num_stages=4 num_batches=8

1f1b_overlap

Running for 1F1B-interleave-overlap strategy:

uv run python main.py strategy=1f1b_interleave_overlap num_devices=4 num_stages=8 num_batches=8

1f1b_interleave_overlap

Configuration

The default configuration is in conf/config.yaml. You can override any parameter on the command line or create configuration groups for different scenarios.

Override Specific Parameters

You can override specific parameters at runtime:

uv run python main.py op_times.forward=0.5 op_times.backward=1.0 num_batches=6

Use DualPipe as an example, you can manually set different time for forward/backward/backward_D/backward_W/overlapped_forward_backward:

uv run python main.py strategy=dualpipe num_devices=8 num_stages=8 num_batches=32 op_times.forward=1.0 op_times.backward=2.0 op_times.backward_D=1.0 op_times.backward_W=1.0 op_times.overlapped_forward_backward=2.5

Using Different Configuration Files

You can use different configuration files with Hydra in several ways:

Recommended Approach

  1. Create multiple configuration files in the conf directory for different use cases:

    conf/
    β”œβ”€β”€ config.yaml     # Default configuration
    └── model_A.yaml    # Create your own config with stage-specific latency for performance projection
    
  2. Run with your desired configuration using the --config-name flag:

    uv run python main.py --config-name=model_A
    

Project Structure

PP-Emulation/
β”œβ”€β”€ conf/                   # Hydra configuration files
β”‚   └── config.yaml         # Default configuration
β”œβ”€β”€ src/                    # Source code
β”‚   β”œβ”€β”€ __init__.py         # Package initialization
β”‚   β”œβ”€β”€ execution_model.py  # Schedule execution models
β”‚   β”œβ”€β”€ strategies.py       # Pipeline parallelism strategies
β”‚   └── visualizer.py       # Visualization utilities
β”œβ”€β”€ main.py                 # Main entry point
β”œβ”€β”€ pyproject.toml          # Project metadata and dependencies
└── README.md               # This file

References

  1. PipeDream: Fast and Efficient Pipeline Parallel DNN Training. arxiv
  2. Efficient Large-Scale Language Model Training on GPU Clusters Using Megatron-LM. arxiv
  3. Zero Bubble Pipeline Parallelism. arxiv
  4. Communication-Computation Overlap in MoE Training with 1F1B Pipeline Parallelism. blog

License

This project is licensed under the MIT License - see the LICENSE file for details.

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.