Tutorials (original) (raw)
DeepSpeed Mixture-of-Quantization (MoQ)
DeepSpeed introduces new support for model compression using quantization, called Mixture-of-Quantization (MoQ). MoQ is designed on top of QAT (Quantization...
Installation Details
The quickest way to get started with DeepSpeed is via pip, this will install the latest release of DeepSpeed which is not tied to specific PyTorch or CUDA ve...
Autotuning
Automatically discover the optimal DeepSpeed configuration that delivers good training speed
DeepNVMe
This tutorial will show how to use DeepNVMe for data transfers between persistent storage and tensors residing in host or device memory. DeepNVMe improves th...
Domino
Domino achieves near-complete communication hiding behind computation for tensor parallel training. Please find our Domino-tutorial in DeepSpeedExample repo.
Flops Profiler
Measure the parameters, latency, and floating-point operations of your model
Megatron-LM GPT2
If you haven’t already, we advise you to first read through the Getting Started guide before stepping through this tutorial.
Mixed Precision ZeRO++
Mixed Precision ZeRO++ (MixZ++) is a set of optimization strategies based on ZeRO and ZeRO++ to improve the efficiency and reduce memory usage for large mode...
Mixture of Experts for NLG models
In this tutorial, we introduce how to apply DeepSpeed Mixture of Experts (MoE) to NLG models, which reduces the training cost by 5 times and reduce the MoE m...
Mixture of Experts
DeepSpeed v0.5 introduces new support for training Mixture of Experts (MoE) models. MoE models are an emerging class of sparsely activated models that have s...
DeepSpeed Model Compression Library
What is DeepSpeed Compression: DeepSpeed Compression is a library purposely built to make it easy to compress models for researchers and practitioners while ...
Monitor
Monitor your model’s training metrics live and log for future analysis
1-Cycle Schedule
This tutorial shows how to implement 1Cycle schedules for learning rate and momentum in PyTorch.
Pipeline Parallelism
DeepSpeed v0.3 includes new support for pipeline parallelism! Pipeline parallelism improves both the memory and compute efficiency of deep learning training ...
DeepSpeed Sparse Attention
In this tutorial we describe how to use DeepSpeed Sparse Attention (SA) and its building-block kernels. The easiest way to use SA is through DeepSpeed launch...
DeepSpeed Transformer Kernel
This tutorial shows how to enable the DeepSpeed transformer kernel and set its different configuration parameters.
DeepSpeed Ulysses-Offload
DeepSpeed Ulysses-Offload is a system of chunking and offloading long-context transformer model training scheme built on top of ZeRO and DeepSpeed Ulysses. I...
ZeRO-Offload
ZeRO-3 Offload consists of a subset of features in our newly released ZeRO-Infinity. Read our ZeRO-Infinity blog to learn more!
ZeRO++
ZeRO++ is a system of communication optimization strategies built on top of ZeRO to offer unmatched efficiency for large model training regardless of the sca...