Amazon SageMaker AI model parallelism library v1 examples (original) (raw)
DocumentationAmazon SageMakerDeveloper Guide
Blogs and Case StudiesExample notebooks
This page provides a list of blogs and Jupyter notebooks that present practical examples of implementing the SageMaker model parallelism (SMP) library v1 to run distributed training jobs on SageMaker AI.
Blogs and Case Studies
The following blogs discuss case studies about using SMP v1.
- New performance improvements in the Amazon SageMaker AI model parallelism library, AWS Machine Learning Blog (December 16, 2022)
- Train gigantic models with near-linear scaling using sharded data parallelism on Amazon SageMaker AI, AWS Machine Learning Blog (October 31, 2022)
Example notebooks
Example notebooks are provided in the SageMaker AI examples GitHub repository. To download the examples, run the following command to clone the repository and go totraining/distributed_training/pytorch/model_parallel
.
Note
Clone and run the example notebooks in the following SageMaker AI ML IDEs.
- SageMaker JupyterLab (available in Studio created after December 2023)
- SageMaker Code Editor (available in Studio created after December 2023)
- Studio Classic (available as an application in Studio created after December 2023)
- SageMaker Notebook Instances
git clone https://github.com/aws/amazon-sagemaker-examples.git
cd amazon-sagemaker-examples/training/distributed_training/pytorch/model_parallel
SMP v1 example notebooks for PyTorch
- Train GPT-2 with near-linear scaling using the sharded data parallelism technique in the SageMaker model parallelism library
- Fine-tune GPT-2 with near-linear scaling using sharded data parallelism technique in the SageMaker model parallelism library
- Train GPT-NeoX-20B with near-linear scaling using the sharded data parallelism technique in the SageMaker model parallelism library
- Train GPT-J 6B using the sharded data parallelism and tensor parallelism techniques in the SageMaker model parallelism library
- Train FLAN-T5 with near-linear scaling using sharded data parallelism technique in the SageMaker model parallelism library
- Train Falcon with near-linear scaling using sharded data parallelism technique in the SageMaker model parallelism library
SMP v1 example notebooks for TensorFlow
- CNN with TensorFlow 2.3.1 and the SageMaker model parallelism library
- HuggingFace with TensorFlow Distributed model parallelism library Training on SageMaker AI
Checkpointing and Fine-Tuning a Model with Model Parallelism
Best Practices
Did this page help you? - Yes
Thanks for letting us know we're doing a good job!
If you've got a moment, please tell us what we did right so we can do more of it.
Did this page help you? - No
Thanks for letting us know this page needs work. We're sorry we let you down.
If you've got a moment, please tell us how we can make the documentation better.