Accelerator · Hugging Face (original) (raw)

The Accelerator is the main class for enabling distributed training on any type of training setup. Read the Add Accelerator to your code tutorial to learn more about how to add the Accelerator to your script.

class accelerate.Accelerator

< source >

( device_placement: bool = True split_batches: bool = <object object at 0x7f1320f5e2e0> mixed_precision: PrecisionType | str | None = None gradient_accumulation_steps: int = 1 cpu: bool = False dataloader_config: DataLoaderConfiguration | None = None deepspeed_plugin: DeepSpeedPlugin | dict[str, DeepSpeedPlugin] | None = None fsdp_plugin: FullyShardedDataParallelPlugin | None = None torch_tp_plugin: TorchTensorParallelPlugin | None = None megatron_lm_plugin: MegatronLMPlugin | None = None rng_types: list[str | RNGType] | None = None log_with: str | LoggerType | GeneralTracker | list[str | LoggerType | GeneralTracker] | None = None project_dir: str | os.PathLike | None = None project_config: ProjectConfiguration | None = None gradient_accumulation_plugin: GradientAccumulationPlugin | None = None step_scheduler_with_optimizer: bool = True kwargs_handlers: list[KwargsHandler] | None = None dynamo_backend: DynamoBackend | str | None = None dynamo_plugin: TorchDynamoPlugin | None = None deepspeed_plugins: DeepSpeedPlugin | dict[str, DeepSpeedPlugin] | None = None parallelism_config: ParallelismConfig | None = None )

Parameters

Creates an instance of an accelerator for distributed training or mixed precision training.

Available attributes:

accumulate

< source >

( *models )

Parameters

A context manager that will lightly wrap around and perform gradient accumulation automatically

Example:

from accelerate import Accelerator

accelerator = Accelerator(gradient_accumulation_steps=1) dataloader, model, optimizer, scheduler = accelerator.prepare(dataloader, model, optimizer, scheduler)

for input, output in dataloader: ... with accelerator.accumulate(model): ... outputs = model(input) ... loss = loss_func(outputs) ... loss.backward() ... optimizer.step() ... scheduler.step() ... optimizer.zero_grad()

autocast

< source >

( autocast_handler: AutocastKwargs = None )

Will apply automatic mixed-precision inside the block inside this context manager, if it is enabled. Nothing different will happen otherwise.

A different autocast_handler can be passed in to override the one set in the Accelerator object. This is useful in blocks under autocast where you want to revert to fp32.

Example:

from accelerate import Accelerator

accelerator = Accelerator(mixed_precision="fp16") with accelerator.autocast(): ... train()

Scales the gradients in accordance to the GradientAccumulationPlugin and calls the correct backward() based on the configuration.

Should be used in lieu of loss.backward().

Example:

from accelerate import Accelerator

accelerator = Accelerator(gradient_accumulation_steps=2) outputs = model(inputs) loss = loss_fn(outputs, labels) accelerator.backward(loss)

Checks if the internal trigger tensor has been set to 1 in any of the processes. If so, will return True and reset the trigger tensor to 0.

Note: Does not require wait_for_everyone()

Example:

from accelerate import Accelerator

accelerator = Accelerator()

if should_do_breakpoint(loss): ... accelerator.set_trigger()

if accelerator.check_trigger(): ... break

Alias for Accelerate.free_memory, releases all references to the internal objects stored and call the garbage collector. You should call this method between two trainings with different models/optimizers.

Example:

from accelerate import Accelerator

accelerator = Accelerator() model, optimizer, scheduler = ... model, optimizer, scheduler = accelerator.prepare(model, optimizer, scheduler) model, optimizer, scheduler = accelerator.clear(model, optimizer, scheduler)

clip_grad_norm_

< source >

( parameters max_norm norm_type = 2 ) → torch.Tensor

Total norm of the parameter gradients (viewed as a single vector).

Should be used in place of torch.nn.utils.clip_grad_norm_.

Example:

from accelerate import Accelerator

accelerator = Accelerator(gradient_accumulation_steps=2) dataloader, model, optimizer, scheduler = accelerator.prepare(dataloader, model, optimizer, scheduler)

for input, target in dataloader: ... optimizer.zero_grad() ... output = model(input) ... loss = loss_func(output, target) ... accelerator.backward(loss) ... if accelerator.sync_gradients: ... accelerator.clip_grad_norm_(model.parameters(), max_grad_norm) ... optimizer.step()

clip_grad_value_

< source >

( parameters clip_value )

Should be used in place of torch.nn.utils.clip_grad_value_.

Example:

from accelerate import Accelerator

accelerator = Accelerator(gradient_accumulation_steps=2) dataloader, model, optimizer, scheduler = accelerator.prepare(dataloader, model, optimizer, scheduler)

for input, target in dataloader: ... optimizer.zero_grad() ... output = model(input) ... loss = loss_func(output, target) ... accelerator.backward(loss) ... if accelerator.sync_gradients: ... accelerator.clip_grad_value_(model.parameters(), clip_value) ... optimizer.step()

deepspeed_ulysses_dl_adapter

< source >

( dl model )

this is normally called as part of prepare but when dataloader was prepared apart from model (for the external accelerator.prepare call) this additional call needs to be made after prepare(model) (see HF Trainer as the use-case)

Runs any special end training behaviors, such as stopping trackers on the main process only or destoying process group. Should always be called at the end of your script if using experiment tracking.

Example:

from accelerate import Accelerator

accelerator = Accelerator(log_with="tensorboard") accelerator.init_trackers("my_project")

accelerator.end_training()

Will release all references to the internal objects stored and call the garbage collector. You should call this method between two trainings with different models/optimizers. Also will reset Accelerator.step to 0.

Example:

from accelerate import Accelerator

accelerator = Accelerator() model, optimizer, scheduler = ... model, optimizer, scheduler = accelerator.prepare(model, optimizer, scheduler) model, optimizer, scheduler = accelerator.free_memory(model, optimizer, scheduler)

gather

< source >

( tensor ) → torch.Tensor, or a nested tuple/list/dictionary of torch.Tensor

Parameters

Returns

torch.Tensor, or a nested tuple/list/dictionary of torch.Tensor

The gathered tensor(s). Note that the first dimension of the result is num_processes multiplied by the first dimension of the input tensors.

Gather the values in tensor across all processes and concatenate them on the first dimension. Useful to regroup the predictions from all processes when doing evaluation.

Note: This gather happens in all processes.

Example:

import torch from accelerate import Accelerator

accelerator = Accelerator() process_tensor = torch.tensor([accelerator.process_index], device=accelerator.device) gathered_tensor = accelerator.gather(process_tensor) gathered_tensor tensor([0, 1, 2, 3])

gather_for_metrics

< source >

( input_data use_gather_object = False )

Parameters

Gathers input_data and potentially drops duplicates in the last batch if on a distributed system. Should be used for gathering the inputs and targets for metric calculation.

Example:

import torch from accelerate import Accelerator

accelerator = Accelerator() dataloader = torch.utils.data.DataLoader(range(9), batch_size=5) dataloader = accelerator.prepare(dataloader) batch = next(iter(dataloader)) gathered_items = accelerator.gather_for_metrics(batch) len(gathered_items) 9

get_state_dict

< source >

( model unwrap = True ) → dict

Parameters

The state dictionary of the model potentially without full precision.

Returns the state dictionary of a model sent through Accelerator.prepare() potentially without full precision.

Example:

import torch from accelerate import Accelerator

accelerator = Accelerator() net = torch.nn.Linear(2, 2) net = accelerator.prepare(net) state_dict = accelerator.get_state_dict(net)

get_tracker

< source >

( name: str unwrap: bool = False ) → GeneralTracker

Parameters

The tracker corresponding to name if it exists.

Returns a tracker from self.trackers based on name on the main process only.

Example:

from accelerate import Accelerator

accelerator = Accelerator(log_with="tensorboard") accelerator.init_trackers("my_project") tensorboard_tracker = accelerator.get_tracker("tensorboard")

join_uneven_inputs

< source >

( joinables even_batches = None )

Parameters

A context manager that facilitates distributed training or evaluation on uneven inputs, which acts as a wrapper around torch.distributed.algorithms.join. This is useful when the total batch size does not evenly divide the length of the dataset.

join_uneven_inputs is only supported for Distributed Data Parallel training on multiple GPUs. For any other configuration, this method will have no effect.

Overriding even_batches will not affect iterable-style data loaders.

Example:

from accelerate import Accelerator

accelerator = Accelerator(even_batches=True) ddp_model, optimizer, dataloader = accelerator.prepare(model, optimizer, dataloader)

with accelerator.join_uneven_inputs([ddp_model], even_batches=False): ... for input, output in dataloader: ... outputs = model(input) ... loss = loss_func(outputs) ... loss.backward() ... optimizer.step() ... optimizer.zero_grad()

load_state

< source >

( input_dir: str | None = None load_kwargs: dict | None = None **load_model_func_kwargs )

Parameters

Loads the current states of the model, optimizer, scaler, RNG generators, and registered objects.

Should only be used in conjunction with Accelerator.save_state(). If a file is not registered for checkpointing, it will not be loaded if stored in the directory.

Example:

from accelerate import Accelerator

accelerator = Accelerator() model, optimizer, lr_scheduler = ... model, optimizer, lr_scheduler = accelerator.prepare(model, optimizer, lr_scheduler) accelerator.load_state("my_checkpoint")

Lets the local main process go inside a with block.

The other processes will enter the with block after the main process exits.

Example:

from accelerate import Accelerator

accelerator = Accelerator() with accelerator.local_main_process_first(): ...
...
... print(f"This will be printed by process {accelerator.local_process_index}")

lomo_backward

< source >

( loss: torch.Tensor learning_rate: float )

Runs backward pass on LOMO optimizers.

Lets the main process go first inside a with block.

The other processes will enter the with block after the main process exits.

Example:

from accelerate import Accelerator

accelerator = Accelerator() with accelerator.main_process_first(): ...
...
... print(f"This will be printed by process {accelerator.process_index}")

maybe_context_parallel

< source >

( buffers: list[torch.Tensor] | None = None buffer_seq_dims: list[int] | None = None no_restore_buffers: set[torch.Tensor] | None = None )

Parameters

A context manager that enables context parallel training.

context_parallel is currently supported with FSDP2 and requires parallelism_config.cp_size >

  1. If either of these conditions are not met, this context manager will have no effect, though to enable fewer code changes it will not raise an Exception.

This context manager has to be recreated with each training step, as shown in the example below.

Example:

for batch in dataloader: ... with accelerator.maybe_context_parallel( ... buffers=[batch["input_ids"], batch["attention_mask"]], ... buffer_seq_dims=[1, 1], ... no_restore_buffers={batch["input_ids"]}, ... ): ... outputs = model(batch) ... ...

no_sync

< source >

( model )

Parameters

A context manager to disable gradient synchronizations across DDP processes by callingtorch.nn.parallel.DistributedDataParallel.no_sync.

If model is not in DDP, this context manager does nothing

Example:

from accelerate import Accelerator

accelerator = Accelerator() dataloader, model, optimizer = accelerator.prepare(dataloader, model, optimizer) input_a = next(iter(dataloader)) input_b = next(iter(dataloader))

with accelerator.no_sync(): ... outputs = model(input_a) ... loss = loss_func(outputs) ... accelerator.backward(loss) ...
outputs = model(input_b) accelerator.backward(loss)

optimizer.step() optimizer.zero_grad()

on_last_process

< source >

( function: Callable[..., Any] )

Parameters

A decorator that will run the decorated function on the last process only. Can also be called using thePartialState class.

Example:

from accelerate import Accelerator

accelerator = Accelerator()

@accelerator.on_last_process def print_something(): print(f"Printed on process {accelerator.process_index}")

print_something() "Printed on process 3"

on_local_main_process

< source >

( function: Callable[..., Any] | None = None )

Parameters

A decorator that will run the decorated function on the local main process only. Can also be called using thePartialState class.

Example:

from accelerate import Accelerator

accelerator = Accelerator()

@accelerator.on_local_main_process def print_something(): print("This will be printed by process 0 only on each server.")

print_something()

"This will be printed by process 0 only"

"This will be printed by process 0 only"

on_local_process

< source >

( function: Callable[..., Any] | None = None local_process_index: int | None = None )

Parameters

A decorator that will run the decorated function on a given local process index only. Can also be called using the PartialState class.

Example:

from accelerate import Accelerator

accelerator = Accelerator()

@accelerator.on_local_process(local_process_index=2) def print_something(): print(f"Printed on process {accelerator.local_process_index}")

print_something()

"Printed on process 2"

"Printed on process 2"

on_main_process

< source >

( function: Callable[..., Any] | None = None )

Parameters

A decorator that will run the decorated function on the main process only. Can also be called using thePartialState class.

Example:

from accelerate import Accelerator

accelerator = Accelerator()

@accelerator.on_main_process ... def print_something(): ... print("This will be printed by process 0 only.")

print_something() "This will be printed by process 0 only"

on_process

< source >

( function: Callable[..., Any] | None = None process_index: int | None = None )

Parameters

A decorator that will run the decorated function on a given process index only. Can also be called using thePartialState class.

Example:

from accelerate import Accelerator

accelerator = Accelerator()

@accelerator.on_process(process_index=2) def print_something(): print(f"Printed on process {accelerator.process_index}")

print_something() "Printed on process 2"

pad_across_processes

< source >

( tensor dim = 0 pad_index = 0 pad_first = False ) → torch.Tensor, or a nested tuple/list/dictionary of torch.Tensor

Parameters

Returns

torch.Tensor, or a nested tuple/list/dictionary of torch.Tensor

The padded tensor(s).

Recursively pad the tensors in a nested list/tuple/dictionary of tensors from all devices to the same size so they can safely be gathered.

Example:

import torch from accelerate import Accelerator

accelerator = Accelerator() process_tensor = torch.arange(accelerator.process_index + 1).to(accelerator.device) padded_tensor = accelerator.pad_across_processes(process_tensor) padded_tensor.shape torch.Size([2])

prepare

< source >

( *args device_placement = None )

Parameters

Prepare all objects passed in args for distributed training and mixed precision, then return them in the same order.

You don’t need to prepare a model if you only use it for inference without any kind of mixed precision

Examples:

from accelerate import Accelerator

accelerator = Accelerator()

model, optimizer, data_loader, scheduler = accelerator.prepare(model, optimizer, data_loader, scheduler)

from accelerate import Accelerator

accelerator = Accelerator()

device_placement = [True, True, False, False]

model, optimizer, data_loader, scheduler = accelerator.prepare( ... model, optimizer, data_loader, scheduler, device_placement=device_placement ... )

prepare_data_loader

< source >

( data_loader: torch.utils.data.DataLoader device_placement = None slice_fn_for_dispatch = None )

Parameters

Prepares a PyTorch DataLoader for training in any distributed setup. It is recommended to useAccelerator.prepare() instead.

Example:

import torch from accelerate import Accelerator

accelerator = Accelerator() data_loader = torch.utils.data.DataLoader(...) data_loader = accelerator.prepare_data_loader(data_loader, device_placement=True)

prepare_model

< source >

( model: torch.nn.Module device_placement: bool | None = None evaluation_mode: bool = False )

Parameters

Prepares a PyTorch model for training in any distributed setup. It is recommended to useAccelerator.prepare() instead.

Example:

from accelerate import Accelerator

accelerator = Accelerator()

model = accelerator.prepare_model(model)

prepare_optimizer

< source >

( optimizer: torch.optim.Optimizer device_placement = None )

Parameters

Prepares a PyTorch Optimizer for training in any distributed setup. It is recommended to useAccelerator.prepare() instead.

Example:

import torch from accelerate import Accelerator

accelerator = Accelerator() optimizer = torch.optim.Adam(...) optimizer = accelerator.prepare_optimizer(optimizer, device_placement=True)

prepare_scheduler

< source >

( scheduler: LRScheduler )

Parameters

Prepares a PyTorch Scheduler for training in any distributed setup. It is recommended to useAccelerator.prepare() instead.

Example:

import torch from accelerate import Accelerator

accelerator = Accelerator() optimizer = torch.optim.Adam(...) scheduler = torch.optim.lr_scheduler.LambdaLR(optimizer, ...) scheduler = accelerator.prepare_scheduler(scheduler)

Drop in replacement of print() to only print once per server.

Example:

from accelerate import Accelerator

accelerator = Accelerator() accelerator.print("Hello world!")

profile

< source >

( profile_handler: ProfileKwargs | None = None )

Parameters

Will profile the code inside the context manager. The profile will be saved to a Chrome Trace file ifprofile_handler.output_trace_dir is set.

A different profile_handler can be passed in to override the one set in the Accelerator object.

Example:

from accelerate import Accelerator from accelerate.utils import ProfileKwargs

accelerator = Accelerator() with accelerator.profile() as prof: train() accelerator.print(prof.key_averages().table())

def custom_handler(prof): print(prof.key_averages().table(sort_by="self_cpu_time_total", row_limit=10))

kwargs = ProfileKwargs(schedule_option=dict(wait=1, warmup=1, active=1), on_trace_ready=custom_handler) accelerator = Accelerator(kwarg_handler=[kwargs]) with accelerator.profile() as prof: for _ in range(10): train_iteration() prof.step()

kwargs = ProfileKwargs(output_trace_dir="output_trace") accelerator = Accelerator(kwarg_handler=[kwargs]) with accelerator.profile(): train()

reduce

< source >

( tensor reduction = 'sum' scale = 1.0 ) → torch.Tensor, or a nested tuple/list/dictionary of torch.Tensor

Parameters

Returns

torch.Tensor, or a nested tuple/list/dictionary of torch.Tensor

The reduced tensor(s).

Reduce the values in tensor across all processes based on reduction.

Note: All processes get the reduced value.

Example:

import torch from accelerate import Accelerator

accelerator = Accelerator() process_tensor = torch.arange(accelerator.num_processes) + 1 + (2 * accelerator.process_index) process_tensor = process_tensor.to(accelerator.device) reduced_tensor = accelerator.reduce(process_tensor, reduction="sum") reduced_tensor tensor([4, 6])

Makes note of objects and will save or load them in during save_state or load_state.

These should be utilized when the state is being loaded or saved in the same script. It is not designed to be used in different scripts.

Every object must have a load_state_dict and state_dict function to be stored.

Example:

from accelerate import Accelerator

accelerator = Accelerator()

obj = CustomObject() accelerator.register_for_checkpointing(obj) accelerator.save_state("checkpoint.pt")

register_load_state_pre_hook

< source >

( hook: Callable[..., None] ) → torch.utils.hooks.RemovableHandle

Parameters

Returns

torch.utils.hooks.RemovableHandle

a handle that can be used to remove the added hook by callinghandle.remove()

Registers a pre hook to be run before load_checkpoint is called in Accelerator.load_state().

The hook should have the following signature:

hook(models: list[torch.nn.Module], input_dir: str) -> None

The models argument are the models as saved in the accelerator state under accelerator._models, and theinput_dir argument is the input_dir argument passed to Accelerator.load_state().

Should only be used in conjunction with Accelerator.register_save_state_pre_hook(). Can be useful to load configurations in addition to model weights. Can also be used to overwrite model loading with a customized method. In this case, make sure to remove already loaded models from the models list.

register_save_state_pre_hook

< source >

( hook: Callable[..., None] ) → torch.utils.hooks.RemovableHandle

Parameters

Returns

torch.utils.hooks.RemovableHandle

a handle that can be used to remove the added hook by callinghandle.remove()

Registers a pre hook to be run before save_checkpoint is called in Accelerator.save_state().

The hook should have the following signature:

hook(models: list[torch.nn.Module], weights: list[dict[str, torch.Tensor]], input_dir: str) -> None

The models argument are the models as saved in the accelerator state under accelerator._models, weightsargument are the state dicts of the models, and the input_dir argument is the input_dir argument passed to Accelerator.load_state().

Should only be used in conjunction with Accelerator.register_load_state_pre_hook(). Can be useful to save configurations in addition to model weights. Can also be used to overwrite model saving with a customized method. In this case, make sure to remove already loaded weights from the weights list.

save

< source >

( obj f safe_serialization = False )

Parameters

Save the object passed to disk once per machine. Use in place of torch.save.

Note: If save_on_each_node was passed in as a ProjectConfiguration, will save the object once per node, rather than only once on the main node.

Example:

from accelerate import Accelerator

accelerator = Accelerator() arr = [0, 1, 2, 3] accelerator.save(arr, "array.pkl")

save_model

< source >

( model: torch.nn.Module save_directory: Union[str, os.PathLike] max_shard_size: Union[int, str] = '10GB' safe_serialization: bool = True )

Parameters

Save a model so that it can be re-loaded using load_checkpoint_in_model

Example:

from accelerate import Accelerator

accelerator = Accelerator() model = ... accelerator.save_model(model, save_directory)

save_state

< source >

( output_dir: str | None = None safe_serialization: bool = True **save_model_func_kwargs )

Parameters

Saves the current states of the model, optimizer, scaler, RNG generators, and registered objects to a folder.

If a ProjectConfiguration was passed to the Accelerator object with automatic_checkpoint_naming enabled then checkpoints will be saved to self.project_dir/checkpoints. If the number of current saves is greater than total_limit then the oldest save is deleted. Each checkpoint is saved in separate folders namedcheckpoint_<iteration>.

Otherwise they are just saved to output_dir.

Should only be used when wanting to save a checkpoint during training and restoring the state in the same environment.

Example:

from accelerate import Accelerator

accelerator = Accelerator() model, optimizer, lr_scheduler = ... model, optimizer, lr_scheduler = accelerator.prepare(model, optimizer, lr_scheduler) accelerator.save_state(output_dir="my_checkpoint")

Sets the internal trigger tensor to 1 on the current process. A latter check should follow using this which will check across all processes.

Note: Does not require wait_for_everyone()

Example:

from accelerate import Accelerator

accelerator = Accelerator()

if should_do_breakpoint(loss): ... accelerator.set_trigger()

if accelerator.check_breakpoint(): ... break

skip_first_batches

< source >

( dataloader num_batches: int = 0 )

Parameters

Creates a new torch.utils.data.DataLoader that will efficiently skip the first num_batches.

Example:

from accelerate import Accelerator

accelerator = Accelerator() dataloader, model, optimizer, scheduler = accelerator.prepare(dataloader, model, optimizer, scheduler) skipped_dataloader = accelerator.skip_first_batches(dataloader, num_batches=2)

for input, target in skipped_dataloader: ... optimizer.zero_grad() ... output = model(input) ... loss = loss_func(output, target) ... accelerator.backward(loss) ... optimizer.step()

for input, target in dataloader: ... optimizer.zero_grad() ... ...

split_between_processes

< source >

( inputs: list | tuple | dict | torch.Tensor apply_padding: bool = False )

Parameters

Splits input between self.num_processes quickly and can be then used on that process. Useful when doing distributed inference, such as with different prompts.

Note that when using a dict, all keys need to have the same number of elements.

Example:

from accelerate import Accelerator

accelerator = Accelerator() with accelerator.split_between_processes(["A", "B", "C"]) as inputs: print(inputs)

["A", "B"]

["C"]

with accelerator.split_between_processes(["A", "B", "C"], apply_padding=True) as inputs: print(inputs)

["A", "B"]

["C", "C"]

trigger_sync_in_backward

< source >

( model )

Parameters

Trigger the sync of the gradients in the next backward pass of the model after multiple forward passes underAccelerator.no_sync (only applicable in multi-GPU scenarios).

If the script is not launched in distributed mode, this context manager does nothing.

Example:

from accelerate import Accelerator

accelerator = Accelerator() dataloader, model, optimizer = accelerator.prepare(dataloader, model, optimizer)

with accelerator.no_sync(): ... loss_a = loss_func(model(input_a))
... loss_b = loss_func(model(input_b))
accelerator.backward(loss_a)
with accelerator.trigger_sync_in_backward(model): ... accelerator.backward(loss_b)
optimizer.step() optimizer.zero_grad()

unscale_gradients

< source >

( optimizer = None )

Parameters

Unscale the gradients in mixed precision training with AMP. This is a noop in all other settings.

Likely should be called through Accelerator.clip_grad_norm_() or Accelerator.clip_grad_value_()

Example:

from accelerate import Accelerator

accelerator = Accelerator() model, optimizer = accelerator.prepare(model, optimizer) outputs = model(inputs) loss = loss_fn(outputs, labels) accelerator.backward(loss) accelerator.unscale_gradients(optimizer=optimizer)

unwrap_model

< source >

( model keep_fp32_wrapper: bool = True keep_torch_compile: bool = True ) → torch.nn.Module

Parameters

The unwrapped model.

Unwraps the model from the additional layer possible added by prepare(). Useful before saving the model.

Example:

from torch.nn.parallel import DistributedDataParallel from accelerate import Accelerator

accelerator = Accelerator() model = accelerator.prepare(MyModel()) print(model.class.name) DistributedDataParallel

model = accelerator.unwrap_model(model) print(model.class.name) MyModel

verify_device_map

< source >

( model: torch.nn.Module )

Verifies that model has not been prepared with big model inference with a device-map resembling auto.

Will stop the execution of the current process until every other process has reached that point (so this does nothing when the script is only run in one process). Useful to do before saving a model.

Example:

import time from accelerate import Accelerator

accelerator = Accelerator() if accelerator.is_main_process: ... time.sleep(2) else: ... print("I'm waiting for the main process to finish its sleep...") accelerator.wait_for_everyone()

print("Everyone is here")

accelerate.utils.gather_object

< source >

( object: typing.Any )

Parameters

Recursively gather object in a nested list/tuple/dictionary of objects from all devices.