AWS Neuron Reference for Nemo Megatron(neuronx-nemo-megatron) Release Notes — AWS Neuron Documentation (original) (raw)
Contents
- neuronx-nemo-megatron [0.8.0]
- neuronx-nemo-megatron [0.7.0]
- neuronx-nemo-megatron [0.6.0]
- neuronx-nemo-megatron [0.5.0]
- neuronx-nemo-megatron [0.4.0]
- neuronx-nemo-megatron [0.3.0]
This document is relevant for: Inf1
, Inf2
, Trn1
, Trn2
AWS Neuron Reference for Nemo Megatron(neuronx-nemo-megatron
) Release Notes#
Table of contents
- neuronx-nemo-megatron [0.8.0]
- neuronx-nemo-megatron [0.7.0]
- neuronx-nemo-megatron [0.6.0]
- neuronx-nemo-megatron [0.5.0]
- neuronx-nemo-megatron [0.4.0]
- neuronx-nemo-megatron [0.3.0]
This document lists the release notes for neuronx-nemo-megatron
library.
neuronx-nemo-megatron [0.8.0]#
Date: 12/20/2024
New in this release#
- Added support for HuggingFace to NeMo checkpoint conversion when virtual pipeline parallel is enabled.
- Added support for Python 3.11
- Added support for PyTorch 2.5
- Added collective compute coalescing for ZeRO-1 optimizer
- Bug fix for flash attention to ensure proper mixed precision data type handling
Known Issues and Limitations#
None at this time.
neuronx-nemo-megatron [0.7.0]#
Date: 09/16/2024
New in this release#
- Fixed issue with linear warmup with cosine annealing
- Fixed indexing issues with MPI job checkpoint conversion.
- Fixed pipeline parallel bug for NeMo to HF checkpoint conversion.
Known Issues and Limitations#
None at this time.
neuronx-nemo-megatron [0.6.0]#
Date: 07/03/2024
New in this release#
- Added support for fp32 gradient accumulation.
- Added support for flash attention kernel.
- Added option for zero1 with master weights.
- Checkpoint conversion script improvements.
- S3 checkpointing improvements.
- Zero1 checkpointing improvements
- Various bug fixes and improvements.
Known Issues and Limitations#
None at this time.
neuronx-nemo-megatron [0.5.0]#
Date: 04/01/2024
New in this release#
- Added support for LoRA fine tuning.
- Added support for Mistral 7B and sliding window attention
- Added support for Zero1 Automatic Mixed Precision.
- Improved throughput at scale of hundreds of nodes.
- Improved support for FP32 optimizer states.
- Merges up and gate projection in Llama for improved throughput.
- Various bug fixes and improvements.
- Fixes for checkpoint restoration accuracy issues.
- Fixes Zero1 checkpointing issues.
Known Issues and Limitations#
None at this time.
neuronx-nemo-megatron [0.4.0]#
Date: 10/15/2023
New in this release#
- Added Llama 70B model pre-training and finetuning support that works with tensor-parallelism and pipeline parallelism using Group Query Attention (GQA)
- Added GPT-NeoX 20B using tensor parallelism and pipeline parallelism.
- Added Checkpoint conversion scripts from Nemo to HuggingFace models for LLama 7B, 13B, 70B, GPT-NeoX FineTuning
- Stability fixes for hangs observed for long running jobs checkpointing at regular time intervals.
- Enabled python 3.10 support with Nemo.
Known Issues and Limitations#
- We are seeing few extra graph compilations than before. These are not limiting functionality or performance.
- Llama2-70B : Tested and validated on 8 nodes. Scaling beyond might see memory issues.
neuronx-nemo-megatron [0.3.0]#
Date: 9/15/2023
New in this release#
- Added Llama 13B model support that works with tensor-parallelism and pipeline parallelism
- Zero1 Optimizer support that works with tensor-parallelism and pipeline parallelism
- Fixes for loading/saving checkpoint OOM issues while loading large models
- Added Docker support
- Feature to save only the last checkpoint and delete previous ones to conserve disk space
- Added FP32 OptimizerState option for mixed precision
- Added Validation loop support
Known Issues and Limitations#
- Tested validation logic with smaller global batch sizes (32). Not tested larger global batch sizes.
This document is relevant for: Inf1
, Inf2
, Trn1
, Trn2