TorchElasticEnvironment — PyTorch Lightning 2.5.1.post0 documentation (original) (raw)
class lightning.pytorch.plugins.environments.TorchElasticEnvironment[source]¶
Bases: ClusterEnvironment
Environment for fault-tolerant and elastic training with torchelastic
Returns True
if the current process was launched using the torchelastic command.
Return type:
The rank (index) of the currently running process across all nodes and devices.
Return type:
The rank (index) of the currently running process inside of the current node.
Return type:
The rank (index) of the node on which the current process runs.
Return type:
validate_settings(num_devices, num_nodes)[source]¶
Validates settings configured in the script against the environment, and raises an exception if there is an inconsistency.
Return type:
The number of processes across all devices and nodes.
Return type:
property creates_processes_externally_: bool_¶
Whether the environment creates the subprocesses or not.
The main address through which all processes connect and communicate.
An open and configured port in the main node through which all processes communicate.