ColossalAIOptimWrapper — mmengine 0.10.7 documentation (original) (raw)
class mmengine._strategy.colossalai.ColossalAIOptimWrapper(optimizer, booster=None, accumulative_counts=1)[source]¶
OptimWrapper for ColossalAI.
The available optimizers are:
- CPUAdam
- FusedAdam
- FusedLAMB
- FusedSGD
- HybridAdam
- Lamb
- Lars
You can find more details in the colossalai tutorial
Parameters:
- optimizer (dict or torch.optim.Optimizer) – The optimizer to be wrapped.
- accumulative_counts (int) – The number of iterations to accumulate gradients. The parameters will be updated per
accumulative_counts
. - booster (None) –
backward(loss, **kwargs)[source]¶
Perform gradient back propagation.
Provide unified backward
interface compatible with automatic mixed precision training. Subclass can overload this method to implement the required logic. For example, torch.cuda.amp
require some extra operation on GradScaler during backward process.
Note
If subclasses inherit from OptimWrapper
overridebackward
, _inner_count +=1
must be implemented.
Parameters:
- loss (torch.Tensor) – The loss of current iteration.
- kwargs – Keyword arguments passed to torch.Tensor.backward().
Return type:
None
A Context for gradient accumulation and automatic mix precision training.
If subclasses need to enable the context for mix precision training, e.g., :class:`AmpOptimWrapper
, the corresponding context should be enabled in optim_context. Since OptimWrapper
uses default fp32 training, optim_context
will only enable the context for blocking the unnecessary gradient synchronization during gradient accumulation
If model is an instance with no_sync
method (which means blocking the gradient synchronization) andself._accumulative_counts != 1
. The model will not automatically synchronize gradients if cur_iter
is divisible byself._accumulative_counts
. Otherwise, this method will enable an empty context.
Parameters:
model (nn.Module) – The training model.