Neuron Documentation Release Notes — AWS Neuron Documentation (original) (raw)
Neuron Documentation Release Notes#
Table of contents
- Neuron 2.21.0
- Neuron 2.20.0
- Neuron 2.19.0
- Neuron 2.18.0
- Neuron 2.16.0
- Neuron 2.15.0
- Neuron 2.14.0
- Neuron 2.13.0
- Neuron 2.12.0
- Neuron 2.11.0
Neuron 2.21.0#
Date: 12/20/2024
Neuron Architectue and Features - Added Trainium2 Architectue guide. See Trainium2 Architecture- Added Trn2 Architecture guide. See Amazon EC2 Trn2 Architecture- Added Logical NeuronCore configuration guide. See Logical NeuronCore configuration- Added NeuronCore-v3 Architecture guide. See NeuronCore-v3 Architecture
Neuron Compiler - Added NKI tutorial for SPMD usage with multiple Neuron Cores on Trn2. See tutorial- Updated NKI FAQ with Trn2 FAQs. See NKI FAQ- Added Direct Allocation Developer Guide- Updated nki.isa API guide with support for new APIs. - Updated nki.language API guide with support for new APIs. - Updated nki.compiler API guide with support for new APIs. - Updated NKI datatype guide with support for float8_e5m2
. - Updated kernels with support for allocated_fused_self_attn_for_SD_small_head_size and allocated_fused_rms_norm_qkv kernels
Neuron Runtime - Updated troubleshooting doc with information on device out-of-memory errors after upgrading to Neuron Driver 2.19 or later. See small_allocations_mempool
NeuronX Distributed Inference - Added Application Note to introduce NxD Inference. See Introducing NeuronX Distributed (NxD) Inference- Added NxD Inference Supported Features Guide. See NxD Inference Features Configuration Guide- Added NxD Inference Tutorial for Deploying Llama 3.1 405B (Trn2). See Tutorial: Deploying Llama3.1 405B (Trn2)- Added NxD Inference API Reference Guide. See nxd-inference-api-guides- Added NxD Inference Production Ready Models (Model Hub) Guide. See NxD Inference - Production Ready Models- Added Migration Guide from NxD examples to NxD Inference. See Migrating from NxD Core inference examples to NxD Inference- Added Migration Guide from Transformers NeuronX to NeuronX Distributed Inference. See Migrating from Transformers NeuronX to NeuronX Distributed(NxD) Inference- Added vLLM User Guide for NxD Inference. See vLLM User Guide for NxD Inference- Added tutorial for deploying Llama3.2 Multimodal Models. See Tutorial: Deploying Llama3.2 Multimodal Models
NeuronX Distributed Training - Updated Training APIs, Training Llama-3.1-70B, Llama-3-70B or Llama-2-13B/70B with Tensor Parallelism and Pipeline Parallelism, Training Llama-3.1-70B, Llama-3-70B or Llama-2-13B/70B with Tensor Parallelism and Pipeline Parallelism, YAML Configuration Settings, and Checkpoint Conversion with support for fused Q,K,V - Updated YAML Configuration Settings with support for Trn2 configuration API - UpdatedDirect Checkpoint Conversion with support for HuggingFace Model Conversion - Added tutorial for HuggingFace Llama3.1/Llama3-70B Pretraining. See HuggingFace Llama3.1/Llama3-70B Pretraining- Added tutorial for HuggingFace Llama3-8B Direct Preference Optimization (DPO) based Fine-tuning. See hf_llama3_8B_DPO
Transformers NeuronX - Updated Transformers NeuronX (transformers-neuronx) Developer Guide and PyTorch NeuronX Tracing API for Inference with support for CPU compilation. - Updated Transformers NeuronX (transformers-neuronx) Developer Guide to enable skipping the first Allgather introduced by flash decoding at the cost of duplicate Q weights. - Updated Transformers NeuronX (transformers-neuronx) Developer Guide with support for EAGLE speculation
Neuron Tools - Added Neuron Profiler 2.0 Beta User Guide with support for system profiles, integration with Perfetto, distributed workload support, etc. See Neuron Profiler 2.0 (Beta) User Guide- Updated nccom-test user guide to include support for Trn2. See NCCOM-TEST User Guide- Updated neuron-ls user guide to include support for Trn2. See Neuron LS User Guide- Updated neuron-monitor user guide to include support for Trn2. See Neuron Monitor User Guide- Updated neuron-top user guide to include support for Trn2. See Neuron Top User Guide- Added Ask Q Developer documentation for general Neuron guidance and jumpstarting NKI kernel developement. See Ask Q Developer
PyTorch NeuronX - Added troubleshooting note for eager debug mode errors. See PyTorch Neuron (torch-neuronx) for Training Troubleshooting Guide- Add torch-neuronx cxx11 ABI documentation. See Install with support for C++11 ABI- Added Migration Guide From XLA_USE_BF16
/ XLA_DOWNCAST_BF16
. See Migration From XLA_USE_BF16/XLA_DOWNCAST_BF16- Updated BERT tutorial to not use XLA_DOWNCAST_BF16
and updated BERT-Large pretraining phase to BFloat16 BERT-Large pretraining with AdamW and stochastic rounding. See Hugging Face BERT Pretraining Tutorial (Data-Parallel)- Added Appliation Note for PyTorch 2.5 support. See Introducing PyTorch 2.5 Support- Updated PyTorch NeuronX Environment Variables document with support for PyTorch 2.5. See PyTorch NeuronX Environment Variables
Misc - Added a third-party developer flow solutions page. See Third-party solutions- Added a third-party libraries page. See Third-party libraries
End of support announcements - Announcing end of support for Neuron DET tool starting next release- Announcing migration of NxD Core examples from NxD Core repository to NxD Inference repository in next release- Announcing end of support for Python 3.8 in future releases- Announcing end of support for PyTorch 1.13 starting next release- Announcing end of support for PyTorch 2.1 starting next release- Neuron no longer includes support for Ubuntu20 DLCs and DLAMIs starting this release- Announcing maintenance mode for torch-neuron 1.9 and 1.10 versions
Neuron 2.20.0#
Date: 09/16/2024
Neuron Compiler
- Added Getting Started with NKI guide for implementing a simple “Hello World” style NKI kernel and running it on a Neuron Device (Trainium/Inferentia2). See Getting Started with NKI
- Added NKI Programming Model guide for explaining the three main stages of the NKI programming model. See NKI Programming Model
- Added NKI Kernel as a Framework Custom Operator guide for explaining how to insert a NKI kernel as a custom operator into a PyTorch or JAX model using simple code examples. See NKI Kernel as a Framework Custom Operator
- Added NKI Tutorials for the following kernels: Tensor addition, Transpose2D, AveragePool2D, Matrix multiplication, RMSNorm, Fused Self Attention, LayerNorm, and Fused Mamba. See nki.kernels
- Added NKI Kernels guide for optimized kernel examples. See nki.kernels
- Added Trainium/Inferentia2 Architecture Guide for NKI. See Trainium/Inferentia2 Architecture Guide for NKI
- Added Profiling NKI kernels with Neuron Profile. See Profiling NKI kernels with Neuron Profile
- Added NKI Performance Guide for explaining a recipe to find performance bottlenecks of NKI kernels and apply common software optimizations to address such bottlenecks. See NKI Performance Guide
- Added NKI API Reference Manual with nki framework and types, nki.language, nki.isa, NKI API Common Fields, and NKI API Errors. See NKI API Reference Manual
- Added NKI FAQ. See NKI FAQ
- Added NKI Known Issues. See NKI Known Issues
- Updated Neuron Glossary with NKI terms. See Neuron Glossary
- Added new NKI samples repository
- Added average_pool2d, fused_mamba, layernorm, matrix_multiplication, rms_norm, sd_attention, tensor_addition, and transpose_2d kernel tutorials to the NKI samples respository. See NKI samples repository
- Added unit and integration tests for each kernel. See NKI samples repository
- Updated Custom Operators API Reference Guide with updated terminology (HBM). See Custom Operators API Reference Guide [Beta]
NeuronX Distributing Training (NxDT)
- Added NxDT (Beta) Developer Guide. See Developer Guide
- Added NxDT Developer Guide for Migrating from NeMo to Neuronx Distributed Training. See NxD Training Compatibility with NeMo
- Added NxDT Developer Guide for Migrating from Neuron-NeMo-Megatron to Neuronx Distributed Training. See Migrating from Neuron-NeMo-Megatron to Neuronx Distributed Training
- Added NxDT Developer Guide for Integrating a new dataset/dataloader. See Integrating a new dataset/dataloader
- Added NxDT Developer Guide for Integrating a new model. See Integrating a New Model
- Added NxDT Developer Guide for Registering an optimizer and LR scheduler. See Registering an optimizer and LR scheduler
- Added NxDT YAML Configuration Overview. See YAML Configuration Settings
- Added Neuronx Distributed Training Library Features documentation. See Neuronx Distributed Training Library Features
- Added Installation instructions for NxDT. See Setup
- Added Known Issues and Workarounds for NxDT. See Known Issues and Workarounds
NeuronX Distributed Core (NxD Core)
- Updated Developer guide for save/load checkpoint (neuronx-distributed ) with ZeRO-1 Optimizer State Offline Conversion. See Developer guide for save/load checkpoint
- Added Developer guide for Standard Mixed Precision with NeuronX Distributed. See Developer guide for Standard Mixed Precision
- Updated NeuronX Distributed API Guide LoRA finetuning support. See Distributed Strategies APIs
- Added Developer guide for LoRA finetuning with NeuronX Distributed. See Developer guide for LoRA finetuning
- Updated CodeLlama tutorial with latest package versions. See tutorial
- Added tutorial for Fine-tuning Llama3 8B with tensor parallelism and LoRA using Neuron PyTorch-Lightning with NeuronX Distributed. See Fine-tuning Llama3 8B with tensor parallelism and LoRA using Neuron PyTorch-Lightning
- Updated links in Llama2 NxD Finetuning tutorial. See Fine-tuning Llama2 7B with tensor parallelism and ZeRO-1 optimizer using Neuron PyTorch-Lightning
- Updated tokenizer download command in tutorials. See Training Llama3.1-8B, Llama3-8B and Llama2-7B with Tensor Parallelism and ZeRO-1 Optimizer, Training Llama-3.1-70B, Llama-3-70B or Llama-2-13B/70B with Tensor Parallelism and Pipeline Parallelism, and Training CodeGen2.5 7B with Tensor Parallelism and ZeRO-1 Optimizer
JAX Neuron
- Added JAX Neuron Main page. See JAX Neuron (beta)
- Added JAX Neuron plugin instructions. See jax-neuronx-setup
- Added JAX Neuron setup instructions. See JAX Setup
PyTorch NeuronX
- Updated Developer Guide for Training with PyTorch NeuronX with support for convolution in AMP. See Developer Guide for Training with PyTorch NeuronX.
- Added inference samples for Wav2Vec2 conformer models with Relative Position Embeddings and Rotary Position Embedding. See sample and sample.
- Updated the ViT sample with updated accelerate version. See sample
- Updated PyTorch NeuronX Environment Variables with
NEURON_TRANSFER_WITH_STATIC_RING_OPS
. See PyTorch NeuronX Environment Variables - Added inference samples for Pixart Alpha and PixArt Sigma models. See sample and sample
- Added benchmarking scripts for PixArt alpha. See benchmarking script
Transformers NeuronX
- Updated Transformers NeuronX Developer Guide with Multi-node inference support (TP/PP). See Transformers NeuronX (transformers-neuronx) Developer Guide
- Updated Transformers NeuronX Developer Guide with BDH layout support. See Transformers NeuronX (transformers-neuronx) Developer Guide
- Updated Transformers NeuronX Developer Guide with Flash Decoding to support long sequence lengths up to 128k. See Transformers NeuronX (transformers-neuronx) Developer Guide
- Updated Transformers NeuronX Developer Guide with presharded weights support. See Transformers NeuronX (transformers-neuronx) Developer Guide
- Added Llama 3.1 405b sample with 16k sequence length. See tutorial
- Added Llama 3.1 70b 64k tutorial. See tutorial
- Added Llama 3.1 8b 128k tutorial. See tutorial
- Removed the sample llama-3-8b-32k-sampling.ipynb and replaced it with Llama-3.1-8B model sample llama-3.1-8b-32k-sampling.ipynb. See sample
Neuron Runtime
- Updated Neuron Runtime Troubleshooting guide with the latest hardware error codes and logs and with Neuron Runtime execution fails at out-of-bound access. See Neuron Runtime Troubleshooting on Inf1, Inf2 and Trn1
- Updated Neuron Sysfs User Guide with new sysfs entries and device reset instructions. See Neuron Sysfs User Guide
- Added Neuron Runtime Input Dump on Trn1 documentation. See nrt-input-dumps
Containers
- Added Neuron Helm Chart repository to help streamline the deployment of AWS Neuron components on Amazon EKS. See repo
- Updated Kubernetes container deployment process with Neuron Helm Chart documentation. See k8s-neuron-helm-chart
- Added guide for Deploying Neuron Container on Elastic Container Service (ECS). See Deploy Neuron Container on Elastic Container Service (ECS) for Training
- Added documentation for Neuron Plugins for Containerized Environments. See Neuron Plugins for Containerized Environments
- Updated guide for locating DLC images. See Neuron Deep Learning Containers
Neuron Tools
- Updated Neuron Profiler User Guide with Alternative output formats. See Neuron Profile User Guide
Software Maintenance and Misc
- Updated the Neuron Software Maintenance Policy. See Neuron Software Maintenance policy
- Added announcement and updated documentation for end of support start for Tensorflow-Neuron 1.x. See Tensorflow-Neuron 1.x no longer supported
- Added announcement and updated documentation for end of support start for ‘neuron-device-version’ field. See ‘neuron-device-version’ field in neuron-monitor no longer supported
- Added announcement and updated documentation for end of support start for ‘neurondevice’ resource name. See ‘neurondevice’ resource name in Neuron Device K8s plugin no longer supported
- Added announcement and updated documentation for end of support start for AL2. See Neuron Runtime no longer supports Amazon Linux 2 (AL2)
- Added announcement for maintenance mode for torch-neuron versions 1.9 and 1.10. See Announcing maintenance mode for torch-neuron 1.9 and 1.10 versions
- Added supported Protobuf versions to the Neuron Release Artifacts. See Release Content
- Updated Neuron Github Roadmap. See Roadmap
Neuron 2.19.0#
Date: 07/03/2024
- Updated Transformers NeuronX Developer guide with support for inference for longer sequence lengths with Flash Attention kernel. See Developer Guide.
- Updated Transformers NeuronX developer guide with QKV Weight Fusion support. See Developer Guide.
- Updated Transformers NeuronX continuous batching developer guide with updated vLLM instructions and models supported. See Developer Guide.
- Updated Neuronx Distributed User guide with interleaved pipeline support. See Distributed Strategies APIs
- Added Codellama 13b 16k tutorial with NeuronX Distributed Inference library. See sample
- Updated PyTorch NeuronX Environment variables with custom SILU enabled via NEURON_CUSTOM_SILU. See PyTorch NeuronX Environment Variables
- Updated ZeRO1 support to have FP32 master weights support and BF16 all-gather. See ZeRO-1 Tutorial.
- Updated PyTorch 2.1 Appplication note with workaround for slower loss convergence for NxD LLaMA-3 70B pretraining using ZeRO1 tutorial. See introduce-pytorch-2-1.
- Updated Neuron DLAMI guide with support for new 2.19 DLAMIs. See Neuron DLAMI User Guide.
- Updated HF-BERT pre-training documentation for port forwarding. See Hugging Face BERT Pretraining Tutorial (Data-Parallel)
- Updated T5 inference tutorial with transformer flag. See sample
- Added support for Llama3 model training. See Training Llama-3.1-70B, Llama-3-70B or Llama-2-13B/70B with Tensor Parallelism and Pipeline Parallelism and Training Llama3.1-8B, Llama3-8B and Llama2-7B with Tensor Parallelism and ZeRO-1 Optimizer
- Added support for Flash Attention kernel for training longer sequences in NeuronX Distributed. See Training Llama3.1-8B, Llama3-8B and Llama2-7B with Tensor Parallelism and ZeRO-1 Optimizer and Distributed Strategies APIs
- Updated Llama2 inference tutorial using NxD Inference library. See sample
- Added new guide for Neuron node problem detection and recovery tool. See configuration and tutorial.
- Added new guide for Neuron Monitor container to enable easy monitoring of Neuron metrics in Kubernetes. Supports monitoring with Prometheus and Grafana. See tutorial
- Updated Neuron scheduler extension documentation about enforcing allocation of contiguous Neuron Devices for the pods based on the Neuron instance type. See tutorial
- Updated Neuron Profiler User Guide with various UI enhancements. See Neuron Profile User Guide
- Added NeuronPerf support in Llama2 inference tutorial in NeuronX Distributed. See sample
- Added announcement for maintenance mode of MxNet. See Neuron support for MxNet enters maintenance mode
- Added announcement for end of support of Neuron TensorFlow 1.x (Inf1). See Announcing end of support for Tensorflow-Neuron 1.x
- Added announcement for end of support of AL2. See Announcing end of support for Neuron Runtime support of Amazon Linux 2 (AL2)
- Added announcement for end of support of ‘neuron-device-version’ field in neuron-monitor. See Announcing end of support for ‘neuron-device-version’ field in neuron-monitor
- Added announcement for end of support of ‘neurondevice’ resource name in Neuron Device K8s plugin. See Announcing end of support for ‘neurondevice’ resource name in Neuron Device K8s plugin
- Added announcement for end of support for Probuf versions <= 3.19 for PyTorch NeuronX. See Announcing end of support for Probuf versions <= 3.19 for PyTorch NeuronX, NeuronX Distributed, and Transformers NeuronX libraries
Neuron 2.18.0#
Date: 04/01/2024
- Updated PyTorch NeuronX developer guide with Snapshotting support. See Snapshotting With Torch-Neuronx 2.1.
- Updated Distributed Strategies APIs and Developer guide for Pipeline Parallelism with support for
auto_partition
API. - Updated Distributed Strategies APIs with enhanced checkpointing support with
load
API andasync_save
API. - Updated documentation for
PyTorch Lightning
to train models usingpipeline parallelism
. See API guide and Developer Guide. - Updated NeuronX Distributed developer guide with support for Autobucketing
- Added PyTorch NeuronX developer guide for Autobucketing.
- Updated Distributed Strategies APIs and Training Llama-3.1-70B, Llama-3-70B or Llama-2-13B/70B with Tensor Parallelism and Pipeline Parallelism with support for asynchronous checkpointing.
- Updated Transformers NeuronX Developer guide with support for streamer and stopping criteria APIs. See Developer Guide.
- Updated Transformers NeuronX Developer guide with instructions for
Repeating N-Gram Filtering
. See Developer Guide. - Updated Transformers NeuronX developer guide with Top-K on-device sampling support [Beta]. See Developer Guide.
- Updated Transformers NeuronX developer guide with Checkpointing support and automatic model selection. See Developer Guide.
- Updated Transformers NeuronX Developer guide with support for speculative sampling [Beta]. See Developer Guide.
- Added sample for training CodeGen2.5 7B with Tensor Parallelism and ZeRO-1 Optimizer with
neuronx-distributed
. See Training CodeGen2.5 7B with Tensor Parallelism and ZeRO-1 Optimizer. - Added Tutorial for codellama/CodeLlama-13b-hf model inference with 16K seq length using Transformers Neuronx. See sample.
- Added Mixtral-8x7B Inference Sample/Notebook using TNx. See sample.
- Added Mistral-7B-Instruct-v0.2 Inference inference sample using TNx. See sample.
- Added announcement for Maintenance mode of TensorFlow 1.x. See Tensorflow-Neuron 1.x enters maintenance mode.
- Updated PyTorch 2.1 documentation to reflect stable (out of beta) support. See introduce-pytorch-2-1.
- Updated PyTorch NeuronX environment variables to reflect stable (out of beta) support. See PyTorch NeuronX Environment Variables.
- Updated Release Content with supported HuggingFace Transformers versions.
- Added user guide instructions for
Neuron DLAMI
. See Neuron DLAMI User Guide. - Updated PyTorch Neuron for Trainium Hugging Face BERT MRPC task finetuning using Hugging Face Trainer API tutorial with latest Hugging Face Trainer API.
- Updated Neuron Runtime API guide with support for
nr_tensor_allocate
. See Developer’s Guide - NeuronX Runtime. - Updated Neuron Sysfs User Guide with support for
serial_number
unique identifier. - Updated Custom Operators API Reference Guide [Beta] limitations and fixed nested sublists. See Neuron Custom C++ Operators Developer Guide [Beta].
- Fixed issue in ZeRO-1 Tutorial.
- Fixed potential hang during synchronization step in
nccom-test
. See NCCOM-TEST User Guide. - Updated troubleshooting guide with an additional hardware error messaging. See Neuron Runtime Troubleshooting on Inf1, Inf2 and Trn1.
- Updated DLC documentation. See Customize Neuron DLC and Deploy Neuron Container on EC2.
Neuron 2.16.0#
Date: 12/21/2023
- Added setup guide instructions for
AL2023
OS. See Setup Guide - Added announcement for name change of Neuron Components. See Announcing Name Change for Neuron Components
- Added announcement for End of Support for
PyTorch 1.10
. See Announcing End of Support for PyTorch Neuron version 1.10 - Added announcement for End of Support for
PyTorch 2.0
Beta. See Announcing End of Support for PyTorch NeuronX version 2.0 (beta) - Added announcement for moving NeuronX Distributed sample model implementations. See Announcing deprecation for NeuronX Distributed Training Samples in Neuron Samples Repository
- Updated Transformers NeuronX developer guide with support for Grouped Query Attention(GQA). See developer guide
- Added sample for
Llama-2-70b
model inference. See tutorial - Added documentation for
PyTorch Lightning
to train models usingtensor parallelism
anddata parallelism
. See api guide , developer guide and tutorial - Added documentation for Model and Optimizer Wrapper training API that handles the parallelization. See api guide and Developer guide for model and optimizer wrapper
- Added documentation for New
save_checkpoint
andload_checkpoint
APIs to save/load checkpoints during distributed training. See Developer guide for save/load checkpoint - Added documentation for a new
Query-Key-Value(QKV)
module in NeuronX Distributed for Training. See api guide and tutorial - Added new developer guide for Inference using NeuronX Distributed. developer guide
- Added
Llama-2-7B
model inference script ([html] [notebook]) - Added App note on Support for
PyTorch 2.1
(Beta) . See introduce-pytorch-2-1 - Added developer guide for
replace_weights
API to replace the separated weights. See PyTorch Neuron (torch-neuronx) Weight Replacement API for Inference - Added [Beta] script for training
stabilityai/stable-diffusion-2-1-base
andrunwayml/stable-diffusion-v1-5
models . See script - Added [Beta] script for training
facebook/bart-large
model. See script - Added [Beta] script for
stabilityai/stable-diffusion-2-inpainting
model inference. See script - Added documentation for new
Neuron Distributed Event Tracing (NDET) tool
to help visualize execution trace logs and diagnose errors in multi-node workloads. See Neuron Distributed Event Tracing (NDET) User Guide - Updated Neuron Profile User guide with support for multi-worker jobs. See Neuron Profile User Guide
- Minor updates to Custom Ops API reference guide.See Custom Operators API Reference Guide [Beta]
Neuron 2.15.0#
Date: 10/26/2023
- New introduce-pytorch-2-0 application note with
torch-neuronx
- New llama2_70b_tp_pp_tutorial and (sample script) using
neuronx-distributed
- New Model samples and tutorials documentation for a consolidated list of code samples and tutorials published by AWS Neuron.
- New Neuron Software Classification documentation for alpha, beta, and stable Neuron SDK definitions and updated documentation references.
- New Pipeline Parallelism Overview and Developer guide for Pipeline Parallelism documentation in
neuronx-distributed
- Updated Neuron Distributed API Guide regarding pipeline-parallelism support and checkpointing
- New Activation Memory Reduction application note and Developer guide for Activation Memory reduction in
neuronx-distributed
- New
Weight Sharing (Deduplication)
notebook script - Added Finetuning script for google/electra-small-discriminator with
torch-neuronx
- Added ResNet50 training (Beta) tutorial and scripts with
torch-neuronx
- Added Vision Perceiver training sample with
torch-neuronx
- Added
flan-t5-xl
model inference tutorial usingneuronx-distributed
- Added
HuggingFace Stable Diffusion 4X Upscaler model Inference on Trn1 / Inf2
sample script withtorch-neuronx
- Updated GPT-NeoX 6.9B and 20B model scripts to include selective checkpointing.
- Added serialization support and removed
-O1
flag constraint toLlama-2-13B
model inference script tutorial withtransformers-neuronx
- Updated
BERT
script andLlama-2-7B
script with Pytorch 2.0 support - Added option-argument
llm-training
to the existing--distribution_strategy
compiler option to make specific optimizations related to training distributed models in Neuron Compiler CLI Reference Guide (neuronx-cc) - Updated Neuron Sysfs User Guide to include mem_ecc_uncorrected and sram_ecc_uncorrected hardware statistics.
- Updated PyTorch NeuronX Tracing API for Inference to include io alias documentation
- Updated Transformers NeuronX (transformers-neuronx) Developer Guide with serialization support.
- Upgraded
numpy
version to1.22.2
for various scripts - Updated
LanguagePerceiver
fine-tuning script tostable
- Announcing End of Support for OPT example in
transformers-neuronx
- Announcing End of Support for “nemo” option-argument
Known Issues and Limitations#
Following tutorials are currently not working. These tutorials will be updated once there is a fix.
Neuron 2.14.0#
Date: 09/15/2023
- Neuron Calculator now supports multiple model configurations for Tensor Parallel Degree computation. See Neuron Calculator
- Announcement to deprecate
--model-type=transformer-inference
flag. See Announcing deprecation for --model-type=transformer-inference compiler flag - Updated HF ViT benchmarking script to use
--model-type=transformer
flag. See [script] - Updated
torch_neuronx.analyze
API documentation. See PyTorch NeuronX Analyze API for Inference - Updated Performance benchmarking numbers for models on Inf1,Inf2 and Trn1 instances with 2.14 release bits. See _benchmark
- New tutorial for Training Llama2 7B with Tensor Parallelism and ZeRO-1 Optimizer using
neuronx-distributed
Training Llama3.1-8B, Llama3-8B and Llama2-7B with Tensor Parallelism and ZeRO-1 Optimizer - New tutorial for
T5-3B
model inference usingneuronx-distributed
(tutorial) - Updated
Neuron Persistent Cache
documentation regarding clarification of flags parsed byneuron_cc_wrapper
tool which is a wrapper overNeuron Compiler CLI
. See Neuron Persistent Cache - Added
tokenizers_parallelism=true
in various notebook scripts to supress tokenizer warnings making errors easier to detect - Updated Neuron device plugin and scheduler YAMLs to point to latest images. See yaml configs
- Added notebook script to fine-tune
deepmind/language-perceiver
model usingtorch-neuronx
. See sample script - Added notebook script to fine-tune
clip-large
model usingtorch-neuronx
. See sample script - Added
SD XL Base+Refiner
inference sample script usingtorch-neuronx
. See sample script - Upgraded default
diffusers
library from 0.14.0 to latest 0.20.2 inStable Diffusion 1.5
andStable Diffusion 2.1
inference scripts. See sample scripts - Added
Llama-2-13B
model training script usingneuronx-nemo-megatron
( tutorial )
Neuron 2.13.0#
Date: 08/28/2023
- Added tutorials for GPT-NEOX 6.9B and 20B models training using neuronx-distributed. See more at Tutorials for NeuronX Distributed
- Added TensorFlow 2.x (
tensorflow-neuronx
) analyze_model API section. See more at TensorFlow 2.x (tensorflow-neuron) analyze_model API - Updated setup instructions to fix path of existing virtual environments in DLAMIs. See more at setup guide
- Updated setup instructions to fix pinned versions in upgrade instructions of setup guide. See more at setup guide
- Updated tensorflow-neuron HF distilbert tutorial to improve performance by removing HF pipeline. See more at [html] [notebook]
- Updated training troubleshooting guide in torch-neuronx to describe network Connectivity Issue on trn1/trn1n 32xlarge with Ubuntu. See more at PyTorch Neuron (torch-neuronx) for Training Troubleshooting Guide
- Added “Unsupported Hardware Operator Code” section to Neuron Runtime Troubleshooting page. See more at Neuron Runtime Troubleshooting on Inf1, Inf2 and Trn1
- Removed ‘beta’ tag from
neuronx-distributed
section for training.neuronx-distributed
Training is now considered stable andneuronx-distributed
inference is considered as beta. - Added FLOP count(
flop_count
) and connected Neuron Device ids (connected_devices
) to sysfs userguide. See Neuron Sysfs User Guide - Added tutorial for
T5
model inference. See more at [notebook] - Updated neuronx-distributed api guide and inference tutorial. See more at Distributed Strategies APIs and Inference with Tensor Parallelism [Beta]
- Announcing End of support for
AWS Neuron reference for Megatron-LM
starting Neuron 2.13. See more at AWS Neuron reference for Megatron-LM no longer supported - Announcing end of support for
torch-neuron
version 1.9 starting Neuron 2.14. See more at Announcing end of support for torch-neuron version 1.9 - Upgraded
numpy
version to1.21.6
in various training scripts for Text Classification - Added license for Nemo Megatron to SDK Maintenance Policy. See more at Neuron Software Maintenance policy
- Updated
bert-japanese
training Script to usemultilingual-sentiments
dataset. See `hf-bert-jp <aws-neuron/aws-neuron-samples> `_ - Added sample script for LLaMA V2 13B model inference using transformers-neuronx. See neuron samples repo
- Added samples for training GPT-NEOX 20B and 6.9B models using neuronx-distributed. See neuron samples repo
- Added sample scripts for CLIP and Stable Diffusion XL inference using torch-neuronx. See neuron samples repo
- Added sample scripts for vision and language Perceiver models inference using torch-neuronx. See neuron samples repo
- Added camembert training/finetuning example for Trn1 under hf_text_classification in torch-neuronx. See neuron samples repo
- Updated Fine-tuning Hugging Face BERT Japanese model sample in torch-neuronx. See neuron samples repo
- See more neuron samples changes in neuron samples release notes
- Added samples for pre-training GPT-3 23B, 46B and 175B models using neuronx-nemo-megatron library. See aws-neuron-parallelcluster-samples
- Announced End of Support for GPT-3 training using aws-neuron-reference-for-megatron-lm library. See aws-neuron-parallelcluster-samples
- Updated bert-fine-tuning SageMaker sample by replacing amazon_reviews_multi dataset with amazon_polarity dataset. See aws-neuron-sagemaker-samples
Neuron 2.12.0#
Date: 07/19/2023
- Added best practices user guide for benchmarking performance of Neuron Devices Benchmarking Guide and Helper scripts
- Announcing end of support for Ubuntu 18. See more at Announcing end of support for Ubuntu 18
- Improved sidebar navigation in Documentation.
- Removed support for Distributed Data Parallel(DDP) Tutorial.
Neuron 2.11.0#
Date: 06/14/2023
- New Neuron Calculator Documentation section to help determine number of Neuron Cores needed for LLM Inference.
- Added App Note Generative LLM inference with Neuron
- New
ML Libraries
Documentation section to have NxD Core and Transformers NeuronX (transformers-neuronx) - Improved Installation and Setup Guides for the different platforms supported. See more at Setup Guide
- Added Tutorial How to prepare trn1.32xlarge for multi-node execution
This document is relevant for: Inf1
, Inf2
, Trn1
, Trn2