Amazon SageMaker AI model parallelism library v1 examples (original) (raw)

Documentation Amazon SageMaker Developer Guide

Blogs and Case Studies Example notebooks

This page provides a list of blogs and Jupyter notebooks that present practical examples of implementing the SageMaker model parallelism (SMP) library v1 to run distributed training jobs on SageMaker AI.

Blogs and Case Studies

The following blogs discuss case studies about using SMP v1.

New performance improvements in the Amazon SageMaker AI model parallelism library, AWS Machine Learning Blog (December 16, 2022)
Train gigantic models with near-linear scaling using sharded data parallelism on Amazon SageMaker AI, AWS Machine Learning Blog (October 31, 2022)

Example notebooks

Example notebooks are provided in the SageMaker AI examples GitHub repository. To download the examples, run the following command to clone the repository and go totraining/distributed_training/pytorch/model_parallel.

Note

Clone and run the example notebooks in the following SageMaker AI ML IDEs.

SageMaker JupyterLab (available in Studio created after December 2023)
SageMaker Code Editor (available in Studio created after December 2023)
Studio Classic (available as an application in Studio created after December 2023)
SageMaker Notebook Instances

git clone https://github.com/aws/amazon-sagemaker-examples.git
cd amazon-sagemaker-examples/training/distributed_training/pytorch/model_parallel

SMP v1 example notebooks for PyTorch

SMP v1 example notebooks for TensorFlow

Document Conventions

Checkpointing and Fine-Tuning a Model with Model Parallelism

Best Practices

Did this page help you? - Yes

Thanks for letting us know we're doing a good job!

If you've got a moment, please tell us what we did right so we can do more of it.

Did this page help you? - No

Thanks for letting us know this page needs work. We're sorry we let you down.

If you've got a moment, please tell us how we can make the documentation better.