Core features of the SageMaker model parallelism library v2 (original) (raw)

DocumentationAmazon SageMakerDeveloper Guide

The Amazon SageMaker AI model parallelism library v2 (SMP v2) offers distribution strategies and memory-saving techniques, such as sharded data parallelism, tensor parallelism, and checkpointing. The model parallelism strategies and techniques offered by SMP v2 help distribute large models across multiple devices while optimizing training speed and memory consumption. SMP v2 also provides a Python package torch.sagemaker to help adapt your training script with few lines of code change.

This guide follows the basic two-step flow introduced in Use the SageMaker model parallelism library v2. To dive deep into the core features of SMP v2 and how to use them, see the following topics.

Note

These core features are available in SMP v2.0.0 and later and the SageMaker Python SDK v2.200.0 and later, and works for PyTorch v2.0.1 and later. To check the versions of the packages, see Supported frameworks and AWS Regions.

Topics

Document Conventions

Use the SMP v2

Hybrid sharded data parallelism

Did this page help you? - Yes

Thanks for letting us know we're doing a good job!

If you've got a moment, please tell us what we did right so we can do more of it.

Did this page help you? - No

Thanks for letting us know this page needs work. We're sorry we let you down.

If you've got a moment, please tell us how we can make the documentation better.