KubeflowEnvironment — PyTorch Lightning 2.6.0 documentation (original) (raw)
class lightning.pytorch.plugins.environments.KubeflowEnvironment[source]¶
Bases: ClusterEnvironment
Environment for distributed training using the PyTorchJob operator from Kubeflow.
This environment, unlike others, does not get auto-detected and needs to be passed to the Fabric/Trainer constructor manually.
Detects the environment settings corresponding to this cluster and returns True if they match.
Return type:
The rank (index) of the currently running process across all nodes and devices.
Return type:
The rank (index) of the currently running process inside of the current node.
Return type:
The rank (index) of the node on which the current process runs.
Return type:
The number of processes across all devices and nodes.
Return type:
property creates_processes_externally_: bool_¶
Whether the environment creates the subprocesses or not.
The main address through which all processes connect and communicate.
An open and configured port in the main node through which all processes communicate.