NxD Inference Release Notes (neuronx-distributed-inference) — AWS Neuron Documentation (original) (raw)

Contents

This document is relevant for: Inf1, Inf2, Trn1, Trn2

NxD Inference Release Notes (neuronx-distributed-inference)#

Table of contents

This document lists the release notes for Neuronx Distributed Inference library.

Neuronx Distributed Inference [0.3.5591] (Neuron 2.23.0 Release)#

Date: 05/20/2025

NxD Inference is now GA and out of beta in the Neuron 2.23 release.

Features in this Release#

Neuronx Distributed Inference [0.2.0] (Beta) (Neuron 2.22.0 Release)#

Date: 04/03/2025

Models in this Release#

Features in this Release#

Backward Incompatible Changes#

Other Changes#

Known Issues and Limitations#

Neuronx Distributed Inference [0.1.1] (Beta) (Neuron 2.21.1 Release)#

Date: 01/14/2025

Bug Fixes#

Neuronx Distributed Inference [0.1.0] (Beta) (Neuron 2.21 Release)#

Date: 12/20/2024

Features in this Release#

NeuronX Distributed (NxD) Inference (neuronx-distributed-inference) is an open-source PyTorch-based inference library that simplifies deep learning model deployment on AWS Inferentia and Trainium instances. Neuronx Distributed Inference includes a model hub and modules that users can reference to implement their own models on Neuron.

This is the first release of NxD Inference (Beta) that includes:

For more information about the features supported by NxDI, see NxD Inference Features Configuration Guide.

Known Issues and Limitations#

Longer Load Times for Large Models#

Issue: Users may experience extended load times when working with large models, particularly during weight sharding and initial model load. This is especially noticeable with models like Llama 3.1 405B.

Root Cause: These delays are primarily due to storage performance limitations.

Recommended Workaround: To mitigate this issue, we recommend that you store model checkpoints in high-performance storage options:

By using these storage optimizations, you can reduce model load times and improve your overall workflow efficiency.

Note: Load times may still vary depending on model size and specific hardware configurations.

Other Issues and Limitations#

Neuronx Distributed Inference [0.1.0] (Beta) (Trn2)#

Date: 12/03/2024

Features in this release#

NeuronX Distributed (NxD) Inference (neuronx-distributed-inference) is an open-source PyTorch-based inference library that simplifies deep learning model deployment on AWS Inferentia and Trainium instances. Neuronx Distributed Inference includes a model hub and modules that users can reference to implement their own models on Neuron.

This is the first release of NxD Inference (Beta) that includes:

For more information about the features supported by NxDI, see NxD Inference Features Configuration Guide.

This document is relevant for: Inf1, Inf2, Trn1, Trn2