KubeflowEnvironment — PyTorch Lightning 2.6.0 documentation (original) (raw)

class lightning.pytorch.plugins.environments.KubeflowEnvironment[source]

Bases: ClusterEnvironment

Environment for distributed training using the PyTorchJob operator from Kubeflow.

This environment, unlike others, does not get auto-detected and needs to be passed to the Fabric/Trainer constructor manually.

static detect()[source]

Detects the environment settings corresponding to this cluster and returns True if they match.

Return type:

bool

global_rank()[source]

The rank (index) of the currently running process across all nodes and devices.

Return type:

int

local_rank()[source]

The rank (index) of the currently running process inside of the current node.

Return type:

int

node_rank()[source]

The rank (index) of the node on which the current process runs.

Return type:

int

world_size()[source]

The number of processes across all devices and nodes.

Return type:

int

property creates_processes_externally_: bool_

Whether the environment creates the subprocesses or not.

property main_address_: str_

The main address through which all processes connect and communicate.

property main_port_: int_

An open and configured port in the main node through which all processes communicate.