Neuron Documentation Release Notes — AWS Neuron Documentation (original) (raw)

Neuron Documentation Release Notes#

Table of contents

Neuron 2.21.0
Neuron 2.20.0
Neuron 2.19.0
Neuron 2.18.0
Neuron 2.16.0
Neuron 2.15.0
Neuron 2.14.0
Neuron 2.13.0
Neuron 2.12.0
Neuron 2.11.0

Neuron 2.21.0 #

Date: 12/20/2024

Neuron Architectue and Features - Added Trainium2 Architectue guide. See Trainium2 Architecture- Added Trn2 Architecture guide. See Amazon EC2 Trn2 Architecture- Added Logical NeuronCore configuration guide. See Logical NeuronCore configuration- Added NeuronCore-v3 Architecture guide. See NeuronCore-v3 Architecture

Neuron Compiler - Added NKI tutorial for SPMD usage with multiple Neuron Cores on Trn2. See tutorial- Updated NKI FAQ with Trn2 FAQs. See NKI FAQ- Added Direct Allocation Developer Guide- Updated nki.isa API guide with support for new APIs. - Updated nki.language API guide with support for new APIs. - Updated nki.compiler API guide with support for new APIs. - Updated NKI datatype guide with support for float8_e5m2. - Updated kernels with support for allocated_fused_self_attn_for_SD_small_head_size and allocated_fused_rms_norm_qkv kernels

Neuron Runtime - Updated troubleshooting doc with information on device out-of-memory errors after upgrading to Neuron Driver 2.19 or later. See small_allocations_mempool

NeuronX Distributed Inference - Added Application Note to introduce NxD Inference. See Introducing NeuronX Distributed (NxD) Inference- Added NxD Inference Supported Features Guide. See NxD Inference Features Configuration Guide- Added NxD Inference Tutorial for Deploying Llama 3.1 405B (Trn2). See Tutorial: Deploying Llama3.1 405B (Trn2)- Added NxD Inference API Reference Guide. See nxd-inference-api-guides- Added NxD Inference Production Ready Models (Model Hub) Guide. See NxD Inference - Production Ready Models- Added Migration Guide from NxD examples to NxD Inference. See Migrating from NxD Core inference examples to NxD Inference- Added Migration Guide from Transformers NeuronX to NeuronX Distributed Inference. See Migrating from Transformers NeuronX to NeuronX Distributed(NxD) Inference- Added vLLM User Guide for NxD Inference. See vLLM User Guide for NxD Inference- Added tutorial for deploying Llama3.2 Multimodal Models. See Tutorial: Deploying Llama3.2 Multimodal Models

NeuronX Distributed Training - Updated Training APIs, Training Llama-3.1-70B, Llama-3-70B or Llama-2-13B/70B with Tensor Parallelism and Pipeline Parallelism, Training Llama-3.1-70B, Llama-3-70B or Llama-2-13B/70B with Tensor Parallelism and Pipeline Parallelism, YAML Configuration Settings, and Checkpoint Conversion with support for fused Q,K,V - Updated YAML Configuration Settings with support for Trn2 configuration API - UpdatedDirect Checkpoint Conversion with support for HuggingFace Model Conversion - Added tutorial for HuggingFace Llama3.1/Llama3-70B Pretraining. See HuggingFace Llama3.1/Llama3-70B Pretraining- Added tutorial for HuggingFace Llama3-8B Direct Preference Optimization (DPO) based Fine-tuning. See hf_llama3_8B_DPO

Transformers NeuronX - Updated Transformers NeuronX (transformers-neuronx) Developer Guide and PyTorch NeuronX Tracing API for Inference with support for CPU compilation. - Updated Transformers NeuronX (transformers-neuronx) Developer Guide to enable skipping the first Allgather introduced by flash decoding at the cost of duplicate Q weights. - Updated Transformers NeuronX (transformers-neuronx) Developer Guide with support for EAGLE speculation

Neuron Tools - Added Neuron Profiler 2.0 Beta User Guide with support for system profiles, integration with Perfetto, distributed workload support, etc. See Neuron Profiler 2.0 (Beta) User Guide- Updated nccom-test user guide to include support for Trn2. See NCCOM-TEST User Guide- Updated neuron-ls user guide to include support for Trn2. See Neuron LS User Guide- Updated neuron-monitor user guide to include support for Trn2. See Neuron Monitor User Guide- Updated neuron-top user guide to include support for Trn2. See Neuron Top User Guide- Added Ask Q Developer documentation for general Neuron guidance and jumpstarting NKI kernel developement. See Ask Q Developer

PyTorch NeuronX - Added troubleshooting note for eager debug mode errors. See PyTorch Neuron (torch-neuronx) for Training Troubleshooting Guide- Add torch-neuronx cxx11 ABI documentation. See Install with support for C++11 ABI- Added Migration Guide From XLA_USE_BF16/ XLA_DOWNCAST_BF16. See Migration From XLA_USE_BF16/XLA_DOWNCAST_BF16- Updated BERT tutorial to not use XLA_DOWNCAST_BF16 and updated BERT-Large pretraining phase to BFloat16 BERT-Large pretraining with AdamW and stochastic rounding. See Hugging Face BERT Pretraining Tutorial (Data-Parallel)- Added Appliation Note for PyTorch 2.5 support. See Introducing PyTorch 2.5 Support- Updated PyTorch NeuronX Environment Variables document with support for PyTorch 2.5. See PyTorch NeuronX Environment Variables

Misc - Added a third-party developer flow solutions page. See Third-party solutions- Added a third-party libraries page. See Third-party libraries

End of support announcements - Announcing end of support for Neuron DET tool starting next release- Announcing migration of NxD Core examples from NxD Core repository to NxD Inference repository in next release- Announcing end of support for Python 3.8 in future releases- Announcing end of support for PyTorch 1.13 starting next release- Announcing end of support for PyTorch 2.1 starting next release- Neuron no longer includes support for Ubuntu20 DLCs and DLAMIs starting this release- Announcing maintenance mode for torch-neuron 1.9 and 1.10 versions

Neuron 2.20.0 #

Date: 09/16/2024

Neuron Compiler

Added Getting Started with NKI guide for implementing a simple “Hello World” style NKI kernel and running it on a Neuron Device (Trainium/Inferentia2). See Getting Started with NKI
Added NKI Programming Model guide for explaining the three main stages of the NKI programming model. See NKI Programming Model
Added NKI Kernel as a Framework Custom Operator guide for explaining how to insert a NKI kernel as a custom operator into a PyTorch or JAX model using simple code examples. See NKI Kernel as a Framework Custom Operator
Added NKI Tutorials for the following kernels: Tensor addition, Transpose2D, AveragePool2D, Matrix multiplication, RMSNorm, Fused Self Attention, LayerNorm, and Fused Mamba. See nki.kernels
Added NKI Kernels guide for optimized kernel examples. See nki.kernels
Added Trainium/Inferentia2 Architecture Guide for NKI. See Trainium/Inferentia2 Architecture Guide for NKI
Added Profiling NKI kernels with Neuron Profile. See Profiling NKI kernels with Neuron Profile
Added NKI Performance Guide for explaining a recipe to find performance bottlenecks of NKI kernels and apply common software optimizations to address such bottlenecks. See NKI Performance Guide
Added NKI API Reference Manual with nki framework and types, nki.language, nki.isa, NKI API Common Fields, and NKI API Errors. See NKI API Reference Manual
Added NKI FAQ. See NKI FAQ
Added NKI Known Issues. See NKI Known Issues
Updated Neuron Glossary with NKI terms. See Neuron Glossary
Added new NKI samples repository
Added average_pool2d, fused_mamba, layernorm, matrix_multiplication, rms_norm, sd_attention, tensor_addition, and transpose_2d kernel tutorials to the NKI samples respository. See NKI samples repository
Added unit and integration tests for each kernel. See NKI samples repository
Updated Custom Operators API Reference Guide with updated terminology (HBM). See Custom Operators API Reference Guide [Beta]

NeuronX Distributing Training (NxDT)

Added NxDT (Beta) Developer Guide. See Developer Guide
Added NxDT Developer Guide for Migrating from NeMo to Neuronx Distributed Training. See NxD Training Compatibility with NeMo
Added NxDT Developer Guide for Migrating from Neuron-NeMo-Megatron to Neuronx Distributed Training. See Migrating from Neuron-NeMo-Megatron to Neuronx Distributed Training
Added NxDT Developer Guide for Integrating a new dataset/dataloader. See Integrating a new dataset/dataloader
Added NxDT Developer Guide for Integrating a new model. See Integrating a New Model
Added NxDT Developer Guide for Registering an optimizer and LR scheduler. See Registering an optimizer and LR scheduler
Added NxDT YAML Configuration Overview. See YAML Configuration Settings
Added Neuronx Distributed Training Library Features documentation. See Neuronx Distributed Training Library Features
Added Installation instructions for NxDT. See Setup
Added Known Issues and Workarounds for NxDT. See Known Issues and Workarounds

NeuronX Distributed Core (NxD Core)

Updated Developer guide for save/load checkpoint (neuronx-distributed ) with ZeRO-1 Optimizer State Offline Conversion. See Developer guide for save/load checkpoint
Added Developer guide for Standard Mixed Precision with NeuronX Distributed. See Developer guide for Standard Mixed Precision
Updated NeuronX Distributed API Guide LoRA finetuning support. See Distributed Strategies APIs
Added Developer guide for LoRA finetuning with NeuronX Distributed. See Developer guide for LoRA finetuning
Updated CodeLlama tutorial with latest package versions. See tutorial
Added tutorial for Fine-tuning Llama3 8B with tensor parallelism and LoRA using Neuron PyTorch-Lightning with NeuronX Distributed. See Fine-tuning Llama3 8B with tensor parallelism and LoRA using Neuron PyTorch-Lightning
Updated links in Llama2 NxD Finetuning tutorial. See Fine-tuning Llama2 7B with tensor parallelism and ZeRO-1 optimizer using Neuron PyTorch-Lightning
Updated tokenizer download command in tutorials. See Training Llama3.1-8B, Llama3-8B and Llama2-7B with Tensor Parallelism and ZeRO-1 Optimizer, Training Llama-3.1-70B, Llama-3-70B or Llama-2-13B/70B with Tensor Parallelism and Pipeline Parallelism, and Training CodeGen2.5 7B with Tensor Parallelism and ZeRO-1 Optimizer

JAX Neuron

Added JAX Neuron Main page. See JAX Neuron (beta)
Added JAX Neuron plugin instructions. See jax-neuronx-setup
Added JAX Neuron setup instructions. See JAX Setup

PyTorch NeuronX

Updated Developer Guide for Training with PyTorch NeuronX with support for convolution in AMP. See Developer Guide for Training with PyTorch NeuronX.
Added inference samples for Wav2Vec2 conformer models with Relative Position Embeddings and Rotary Position Embedding. See sample and sample.
Updated the ViT sample with updated accelerate version. See sample
Updated PyTorch NeuronX Environment Variables with NEURON_TRANSFER_WITH_STATIC_RING_OPS. See PyTorch NeuronX Environment Variables
Added inference samples for Pixart Alpha and PixArt Sigma models. See sample and sample
Added benchmarking scripts for PixArt alpha. See benchmarking script

Transformers NeuronX

Updated Transformers NeuronX Developer Guide with Multi-node inference support (TP/PP). See Transformers NeuronX (transformers-neuronx) Developer Guide
Updated Transformers NeuronX Developer Guide with BDH layout support. See Transformers NeuronX (transformers-neuronx) Developer Guide
Updated Transformers NeuronX Developer Guide with Flash Decoding to support long sequence lengths up to 128k. See Transformers NeuronX (transformers-neuronx) Developer Guide
Updated Transformers NeuronX Developer Guide with presharded weights support. See Transformers NeuronX (transformers-neuronx) Developer Guide
Added Llama 3.1 405b sample with 16k sequence length. See tutorial
Added Llama 3.1 70b 64k tutorial. See tutorial
Added Llama 3.1 8b 128k tutorial. See tutorial
Removed the sample llama-3-8b-32k-sampling.ipynb and replaced it with Llama-3.1-8B model sample llama-3.1-8b-32k-sampling.ipynb. See sample

Neuron Runtime

Updated Neuron Runtime Troubleshooting guide with the latest hardware error codes and logs and with Neuron Runtime execution fails at out-of-bound access. See Neuron Runtime Troubleshooting on Inf1, Inf2 and Trn1
Updated Neuron Sysfs User Guide with new sysfs entries and device reset instructions. See Neuron Sysfs User Guide
Added Neuron Runtime Input Dump on Trn1 documentation. See nrt-input-dumps

Containers

Added Neuron Helm Chart repository to help streamline the deployment of AWS Neuron components on Amazon EKS. See repo
Updated Kubernetes container deployment process with Neuron Helm Chart documentation. See k8s-neuron-helm-chart
Added guide for Deploying Neuron Container on Elastic Container Service (ECS). See Deploy Neuron Container on Elastic Container Service (ECS) for Training
Added documentation for Neuron Plugins for Containerized Environments. See Neuron Plugins for Containerized Environments
Updated guide for locating DLC images. See Neuron Deep Learning Containers

Neuron Tools

Updated Neuron Profiler User Guide with Alternative output formats. See Neuron Profile User Guide

Software Maintenance and Misc

Updated the Neuron Software Maintenance Policy. See Neuron Software Maintenance policy
Added announcement and updated documentation for end of support start for Tensorflow-Neuron 1.x. See Tensorflow-Neuron 1.x no longer supported
Added announcement and updated documentation for end of support start for ‘neuron-device-version’ field. See ‘neuron-device-version’ field in neuron-monitor no longer supported
Added announcement and updated documentation for end of support start for ‘neurondevice’ resource name. See ‘neurondevice’ resource name in Neuron Device K8s plugin no longer supported
Added announcement and updated documentation for end of support start for AL2. See Neuron Runtime no longer supports Amazon Linux 2 (AL2)
Added announcement for maintenance mode for torch-neuron versions 1.9 and 1.10. See Announcing maintenance mode for torch-neuron 1.9 and 1.10 versions
Added supported Protobuf versions to the Neuron Release Artifacts. See Release Content
Updated Neuron Github Roadmap. See Roadmap

Neuron 2.19.0 #

Date: 07/03/2024

Updated Transformers NeuronX Developer guide with support for inference for longer sequence lengths with Flash Attention kernel. See Developer Guide.
Updated Transformers NeuronX developer guide with QKV Weight Fusion support. See Developer Guide.
Updated Transformers NeuronX continuous batching developer guide with updated vLLM instructions and models supported. See Developer Guide.
Updated Neuronx Distributed User guide with interleaved pipeline support. See Distributed Strategies APIs
Added Codellama 13b 16k tutorial with NeuronX Distributed Inference library. See sample
Updated PyTorch NeuronX Environment variables with custom SILU enabled via NEURON_CUSTOM_SILU. See PyTorch NeuronX Environment Variables
Updated ZeRO1 support to have FP32 master weights support and BF16 all-gather. See ZeRO-1 Tutorial.
Updated PyTorch 2.1 Appplication note with workaround for slower loss convergence for NxD LLaMA-3 70B pretraining using ZeRO1 tutorial. See introduce-pytorch-2-1.
Updated Neuron DLAMI guide with support for new 2.19 DLAMIs. See Neuron DLAMI User Guide.
Updated HF-BERT pre-training documentation for port forwarding. See Hugging Face BERT Pretraining Tutorial (Data-Parallel)
Updated T5 inference tutorial with transformer flag. See sample
Added support for Llama3 model training. See Training Llama-3.1-70B, Llama-3-70B or Llama-2-13B/70B with Tensor Parallelism and Pipeline Parallelism and Training Llama3.1-8B, Llama3-8B and Llama2-7B with Tensor Parallelism and ZeRO-1 Optimizer
Added support for Flash Attention kernel for training longer sequences in NeuronX Distributed. See Training Llama3.1-8B, Llama3-8B and Llama2-7B with Tensor Parallelism and ZeRO-1 Optimizer and Distributed Strategies APIs
Updated Llama2 inference tutorial using NxD Inference library. See sample
Added new guide for Neuron node problem detection and recovery tool. See configuration and tutorial.
Added new guide for Neuron Monitor container to enable easy monitoring of Neuron metrics in Kubernetes. Supports monitoring with Prometheus and Grafana. See tutorial
Updated Neuron scheduler extension documentation about enforcing allocation of contiguous Neuron Devices for the pods based on the Neuron instance type. See tutorial
Updated Neuron Profiler User Guide with various UI enhancements. See Neuron Profile User Guide
Added NeuronPerf support in Llama2 inference tutorial in NeuronX Distributed. See sample
Added announcement for maintenance mode of MxNet. See Neuron support for MxNet enters maintenance mode
Added announcement for end of support of Neuron TensorFlow 1.x (Inf1). See Announcing end of support for Tensorflow-Neuron 1.x
Added announcement for end of support of AL2. See Announcing end of support for Neuron Runtime support of Amazon Linux 2 (AL2)
Added announcement for end of support of ‘neuron-device-version’ field in neuron-monitor. See Announcing end of support for ‘neuron-device-version’ field in neuron-monitor
Added announcement for end of support of ‘neurondevice’ resource name in Neuron Device K8s plugin. See Announcing end of support for ‘neurondevice’ resource name in Neuron Device K8s plugin
Added announcement for end of support for Probuf versions <= 3.19 for PyTorch NeuronX. See Announcing end of support for Probuf versions <= 3.19 for PyTorch NeuronX, NeuronX Distributed, and Transformers NeuronX libraries

Neuron 2.18.0 #

Date: 04/01/2024

Updated PyTorch NeuronX developer guide with Snapshotting support. See Snapshotting With Torch-Neuronx 2.1.
Updated Distributed Strategies APIs and Developer guide for Pipeline Parallelism with support for auto_partition API.
Updated Distributed Strategies APIs with enhanced checkpointing support with load API and async_save API.
Updated documentation for PyTorch Lightning to train models using pipeline parallelism . See API guide and Developer Guide.
Updated NeuronX Distributed developer guide with support for Autobucketing
Added PyTorch NeuronX developer guide for Autobucketing.
Updated Distributed Strategies APIs and Training Llama-3.1-70B, Llama-3-70B or Llama-2-13B/70B with Tensor Parallelism and Pipeline Parallelism with support for asynchronous checkpointing.
Updated Transformers NeuronX Developer guide with support for streamer and stopping criteria APIs. See Developer Guide.
Updated Transformers NeuronX Developer guide with instructions for Repeating N-Gram Filtering. See Developer Guide.
Updated Transformers NeuronX developer guide with Top-K on-device sampling support [Beta]. See Developer Guide.
Updated Transformers NeuronX developer guide with Checkpointing support and automatic model selection. See Developer Guide.
Updated Transformers NeuronX Developer guide with support for speculative sampling [Beta]. See Developer Guide.
Added sample for training CodeGen2.5 7B with Tensor Parallelism and ZeRO-1 Optimizer with neuronx-distributed. See Training CodeGen2.5 7B with Tensor Parallelism and ZeRO-1 Optimizer.
Added Tutorial for codellama/CodeLlama-13b-hf model inference with 16K seq length using Transformers Neuronx. See sample.
Added Mixtral-8x7B Inference Sample/Notebook using TNx. See sample.
Added Mistral-7B-Instruct-v0.2 Inference inference sample using TNx. See sample.
Added announcement for Maintenance mode of TensorFlow 1.x. See Tensorflow-Neuron 1.x enters maintenance mode.
Updated PyTorch 2.1 documentation to reflect stable (out of beta) support. See introduce-pytorch-2-1.
Updated PyTorch NeuronX environment variables to reflect stable (out of beta) support. See PyTorch NeuronX Environment Variables.
Updated Release Content with supported HuggingFace Transformers versions.
Added user guide instructions for Neuron DLAMI. See Neuron DLAMI User Guide.
Updated PyTorch Neuron for Trainium Hugging Face BERT MRPC task finetuning using Hugging Face Trainer API tutorial with latest Hugging Face Trainer API.
Updated Neuron Runtime API guide with support for nr_tensor_allocate. See Developer’s Guide - NeuronX Runtime.
Updated Neuron Sysfs User Guide with support for serial_number unique identifier.
Updated Custom Operators API Reference Guide [Beta] limitations and fixed nested sublists. See Neuron Custom C++ Operators Developer Guide [Beta].
Fixed issue in ZeRO-1 Tutorial.
Fixed potential hang during synchronization step in nccom-test. See NCCOM-TEST User Guide.
Updated troubleshooting guide with an additional hardware error messaging. See Neuron Runtime Troubleshooting on Inf1, Inf2 and Trn1.
Updated DLC documentation. See Customize Neuron DLC and Deploy Neuron Container on EC2.

Neuron 2.16.0 #

Date: 12/21/2023

Added setup guide instructions for AL2023 OS. See Setup Guide
Added announcement for name change of Neuron Components. See Announcing Name Change for Neuron Components
Added announcement for End of Support for PyTorch 1.10 . See Announcing End of Support for PyTorch Neuron version 1.10
Added announcement for End of Support for PyTorch 2.0 Beta. See Announcing End of Support for PyTorch NeuronX version 2.0 (beta)
Added announcement for moving NeuronX Distributed sample model implementations. See Announcing deprecation for NeuronX Distributed Training Samples in Neuron Samples Repository
Updated Transformers NeuronX developer guide with support for Grouped Query Attention(GQA). See developer guide
Added sample for Llama-2-70b model inference. See tutorial
Added documentation for PyTorch Lightning to train models using tensor parallelism and data parallelism . See api guide , developer guide and tutorial
Added documentation for Model and Optimizer Wrapper training API that handles the parallelization. See api guide and Developer guide for model and optimizer wrapper
Added documentation for New save_checkpoint and load_checkpoint APIs to save/load checkpoints during distributed training. See Developer guide for save/load checkpoint
Added documentation for a new Query-Key-Value(QKV) module in NeuronX Distributed for Training. See api guide and tutorial
Added new developer guide for Inference using NeuronX Distributed. developer guide
Added Llama-2-7B model inference script ([html] [notebook])
Added App note on Support for PyTorch 2.1 (Beta) . See introduce-pytorch-2-1
Added developer guide for replace_weights API to replace the separated weights. See PyTorch Neuron (torch-neuronx) Weight Replacement API for Inference
Added [Beta] script for training stabilityai/stable-diffusion-2-1-base and runwayml/stable-diffusion-v1-5 models . See script
Added [Beta] script for training facebook/bart-large model. See script
Added [Beta] script for stabilityai/stable-diffusion-2-inpainting model inference. See script
Added documentation for new Neuron Distributed Event Tracing (NDET) tool to help visualize execution trace logs and diagnose errors in multi-node workloads. See Neuron Distributed Event Tracing (NDET) User Guide
Updated Neuron Profile User guide with support for multi-worker jobs. See Neuron Profile User Guide
Minor updates to Custom Ops API reference guide.See Custom Operators API Reference Guide [Beta]

Neuron 2.15.0 #

Date: 10/26/2023

New introduce-pytorch-2-0 application note with torch-neuronx
New llama2_70b_tp_pp_tutorial and (sample script) using neuronx-distributed
New Model samples and tutorials documentation for a consolidated list of code samples and tutorials published by AWS Neuron.
New Neuron Software Classification documentation for alpha, beta, and stable Neuron SDK definitions and updated documentation references.
New Pipeline Parallelism Overview and Developer guide for Pipeline Parallelism documentation in neuronx-distributed
Updated Neuron Distributed API Guide regarding pipeline-parallelism support and checkpointing
New Activation Memory Reduction application note and Developer guide for Activation Memory reduction in neuronx-distributed
New Weight Sharing (Deduplication) notebook script
Added Finetuning script for google/electra-small-discriminator with torch-neuronx
Added ResNet50 training (Beta) tutorial and scripts with torch-neuronx
Added Vision Perceiver training sample with torch-neuronx
Added flan-t5-xl model inference tutorial using neuronx-distributed
Added HuggingFace Stable Diffusion 4X Upscaler model Inference on Trn1 / Inf2 sample script with torch-neuronx
Updated GPT-NeoX 6.9B and 20B model scripts to include selective checkpointing.
Added serialization support and removed -O1 flag constraint to Llama-2-13B model inference script tutorial with transformers-neuronx
Updated BERT script and Llama-2-7B script with Pytorch 2.0 support
Added option-argument llm-training to the existing --distribution_strategy compiler option to make specific optimizations related to training distributed models in Neuron Compiler CLI Reference Guide (neuronx-cc)
Updated Neuron Sysfs User Guide to include mem_ecc_uncorrected and sram_ecc_uncorrected hardware statistics.
Updated PyTorch NeuronX Tracing API for Inference to include io alias documentation
Updated Transformers NeuronX (transformers-neuronx) Developer Guide with serialization support.
Upgraded numpy version to 1.22.2 for various scripts
Updated LanguagePerceiver fine-tuning script to stable
Announcing End of Support for OPT example in transformers-neuronx
Announcing End of Support for “nemo” option-argument

Known Issues and Limitations#

Following tutorials are currently not working. These tutorials will be updated once there is a fix.

Neuron 2.14.0 #

Date: 09/15/2023

Neuron Calculator now supports multiple model configurations for Tensor Parallel Degree computation. See Neuron Calculator
Announcement to deprecate --model-type=transformer-inference flag. See Announcing deprecation for --model-type=transformer-inference compiler flag
Updated HF ViT benchmarking script to use --model-type=transformer flag. See [script]
Updated torch_neuronx.analyze API documentation. See PyTorch NeuronX Analyze API for Inference
Updated Performance benchmarking numbers for models on Inf1,Inf2 and Trn1 instances with 2.14 release bits. See _benchmark
New tutorial for Training Llama2 7B with Tensor Parallelism and ZeRO-1 Optimizer using neuronx-distributed Training Llama3.1-8B, Llama3-8B and Llama2-7B with Tensor Parallelism and ZeRO-1 Optimizer
New tutorial for T5-3B model inference using neuronx-distributed (tutorial)
Updated Neuron Persistent Cache documentation regarding clarification of flags parsed by neuron_cc_wrapper tool which is a wrapper over Neuron Compiler CLI. See Neuron Persistent Cache
Added tokenizers_parallelism=true in various notebook scripts to supress tokenizer warnings making errors easier to detect
Updated Neuron device plugin and scheduler YAMLs to point to latest images. See yaml configs
Added notebook script to fine-tune deepmind/language-perceiver model using torch-neuronx. See sample script
Added notebook script to fine-tune clip-large model using torch-neuronx. See sample script
Added SD XL Base+Refiner inference sample script using torch-neuronx. See sample script
Upgraded default diffusers library from 0.14.0 to latest 0.20.2 in Stable Diffusion 1.5 and Stable Diffusion 2.1 inference scripts. See sample scripts
Added Llama-2-13B model training script using neuronx-nemo-megatron ( tutorial )

Neuron 2.13.0 #

Date: 08/28/2023

Added tutorials for GPT-NEOX 6.9B and 20B models training using neuronx-distributed. See more at Tutorials for NeuronX Distributed
Added TensorFlow 2.x (tensorflow-neuronx) analyze_model API section. See more at TensorFlow 2.x (tensorflow-neuron) analyze_model API
Updated setup instructions to fix path of existing virtual environments in DLAMIs. See more at setup guide
Updated setup instructions to fix pinned versions in upgrade instructions of setup guide. See more at setup guide
Updated tensorflow-neuron HF distilbert tutorial to improve performance by removing HF pipeline. See more at [html] [notebook]
Updated training troubleshooting guide in torch-neuronx to describe network Connectivity Issue on trn1/trn1n 32xlarge with Ubuntu. See more at PyTorch Neuron (torch-neuronx) for Training Troubleshooting Guide
Added “Unsupported Hardware Operator Code” section to Neuron Runtime Troubleshooting page. See more at Neuron Runtime Troubleshooting on Inf1, Inf2 and Trn1
Removed ‘beta’ tag from neuronx-distributed section for training. neuronx-distributed Training is now considered stable and neuronx-distributed inference is considered as beta.
Added FLOP count(flop_count) and connected Neuron Device ids (connected_devices) to sysfs userguide. See Neuron Sysfs User Guide
Added tutorial for T5 model inference. See more at [notebook]
Updated neuronx-distributed api guide and inference tutorial. See more at Distributed Strategies APIs and Inference with Tensor Parallelism [Beta]
Announcing End of support for AWS Neuron reference for Megatron-LM starting Neuron 2.13. See more at AWS Neuron reference for Megatron-LM no longer supported
Announcing end of support for torch-neuron version 1.9 starting Neuron 2.14. See more at Announcing end of support for torch-neuron version 1.9
Upgraded numpy version to 1.21.6 in various training scripts for Text Classification
Added license for Nemo Megatron to SDK Maintenance Policy. See more at Neuron Software Maintenance policy
Updated bert-japanese training Script to use multilingual-sentiments dataset. See `hf-bert-jp <aws-neuron/aws-neuron-samples> `_
Added sample script for LLaMA V2 13B model inference using transformers-neuronx. See neuron samples repo
Added samples for training GPT-NEOX 20B and 6.9B models using neuronx-distributed. See neuron samples repo
Added sample scripts for CLIP and Stable Diffusion XL inference using torch-neuronx. See neuron samples repo
Added sample scripts for vision and language Perceiver models inference using torch-neuronx. See neuron samples repo
Added camembert training/finetuning example for Trn1 under hf_text_classification in torch-neuronx. See neuron samples repo
Updated Fine-tuning Hugging Face BERT Japanese model sample in torch-neuronx. See neuron samples repo
See more neuron samples changes in neuron samples release notes
Added samples for pre-training GPT-3 23B, 46B and 175B models using neuronx-nemo-megatron library. See aws-neuron-parallelcluster-samples
Announced End of Support for GPT-3 training using aws-neuron-reference-for-megatron-lm library. See aws-neuron-parallelcluster-samples
Updated bert-fine-tuning SageMaker sample by replacing amazon_reviews_multi dataset with amazon_polarity dataset. See aws-neuron-sagemaker-samples

Neuron 2.12.0 #

Date: 07/19/2023

Added best practices user guide for benchmarking performance of Neuron Devices Benchmarking Guide and Helper scripts
Announcing end of support for Ubuntu 18. See more at Announcing end of support for Ubuntu 18
Improved sidebar navigation in Documentation.
Removed support for Distributed Data Parallel(DDP) Tutorial.

Neuron 2.11.0 #

Date: 06/14/2023

New Neuron Calculator Documentation section to help determine number of Neuron Cores needed for LLM Inference.
Added App Note Generative LLM inference with Neuron
New ML Libraries Documentation section to have NxD Core and Transformers NeuronX (transformers-neuronx)
Improved Installation and Setup Guides for the different platforms supported. See more at Setup Guide
Added Tutorial How to prepare trn1.32xlarge for multi-node execution

This document is relevant for: Inf1, Inf2, Trn1, Trn2