DDP NCCL Parameters For Performance 路 Issue #7179 路 Lightning-AI/pytorch-lightning (original) (raw)

馃殌 Feature

Motivation

From several experiments, DDP on NCCL backend

For example: XLM-RoBERTa (https://arxiv.org/abs/1911.02116), 30% speedup for NCCL_NSOCKS_PERTHREA = 4 and NCCL_SOCKET_NTHREADS = 2

Detectron2 (https://github.com/facebookresearch/detectron2), 15% speedup for NCCL_NSOCKS_PERTHREA = 4 and NCCL_SOCKET_NTHREADS = 2

Pitch

we could pass these parameters from kwargs (similarly as find_unused_parameters), and set these parameters in setup_environment when initializing ddp process.

Alternatives

Additional context