Neuron Compiler (neuron-cc) for Inf1 Release Notes — AWS Neuron Documentation (original) (raw)

Neuron Compiler (neuron-cc) for Inf1 Release Notes#

Table of contents

Introduction#

This document lists the release notes for AWS Neuron compiler. The Neuron Compiler is an ahead-of-time compiler that ensures Neuron will optimally utilize the Inferentia chips.

Operator-support for each input format is provided directly from the compiler.

neuron-cc list-operators --framework {TENSORFLOW | MXNET | XLA}

The supported operators are also listed here:

Tensorflow: TensorFlow Neuron (tensorflow-neuron (TF1.x)) Supported operators

Pytorch: PyTorch Neuron (torch-neuron) Supported operators

XLA: TensorFlow Neuron (tensorflow-neuron (TF1.x)) Supported operators [XLA]

Apache MXNet: Neuron Apache MXNet Supported operators

Known issues and limitations - updated 11/23/2022#

Neuron Compiler release [1.21.0.0]]#

Date: 12/21/2023

Neuron Compiler release [1.20.3.0]]#

Date: 10/26/2023

Neuron Compiler release [1.19.0.0]]#

Date: 09/15/2023

Neuron Compiler release [1.17.0.0]]#

Date: 7/19/2023

New in this release#

Neuron Compiler release [1.16.2.0]#

Date: 6/14/2023

Neuron Compiler release [1.15.0.0]#

Date: 05/01/2023

Neuron Compiler release [1.14.3.0]#

Date: 04/19/2023

Neuron Compiler release [1.13.3.0]#

Date: 11/23/2022

Neuron Compiler release [1.11.7.0]#

Date: 08/02/2022

Neuron Compiler release [1.11.4.0]#

Date: 04/29/2022

Neuron Compiler release [1.10.3.0]#

Date: 03/25/2022

Neuron Compiler release [1.9.1.0]#

Date: 01/20/2022

Neuron Compiler release [1.8.5.0]#

Date: 01/05/2022

New in this release#

Neuron Compiler release [1.8.2.0]#

Date: 12/15/2021

New in this release#

Neuron Compiler release [1.7.3.0]#

Date: 10/27/2021

New in this release#

[1.6.13.0]#

Date 08/12/2021

New in this release#

Resolved issues#

[1.5.5.0]#

Date 07/02/2021

Summary#

New in this release#

[1.4.0.0]#

Date 5/28/2021

Summary#

New in this release#

[1.3.0.0]#

Date 4/30/2021

Summary#

New in this release#

Resolved Issues#

[1.2.7.0]#

Date 2/24/2021

Summary#

Fix for CVE-2021-3177.

[1.2.2.0]#

Date 1/30/2021

Summary#

Added suport for multiple new operators (see operators list) for Tensoflow and MXNET. Improved inference performance of language, object recognition models on single as well as multiple pipelined cores using neuroncore-pipeline.

New in this release#

Resolved Issues#

[1.1.7.0]#

Date 12/23/2020

Summary#

Added suport for PyTorch Yolo V4, a new Framework-visible progress bar and improved inference performance. We continue to streamline the compiler usability by removing the need for options passed to control behavior. We are aiming to remove the need for such options entirely. Some tutorials have been updated to reflect this, but Resnet50 remains in need of these options to achieve maximum performance. Other useability improvements have been added, such as the compiler progress bar. As always, please let us know if there are other areas that we can improve.

New in this release#

Resolved Issues#

[1.0.24045.0]#

Date 11/17/2020

Summary#

Improved performance for pipelined execution (NeuronCore Pipeline).

New in this release#

Resolved Issues#

[1.0.20600.0]#

Date 9/22/2020

Summary#

Various performance improvements - both compilation time and inference speed of object recognition models.

New in this release#

Resolved Issues#

[1.0.18001.0]#

Date 8/08/2020

Summary#

Various performance improvements.

New in this release#

Improved performance of BERT base with -O2

Resolved Issues#

[1.0.17937.0]#

Date 8/05/2020

Summary#

Various improvements.

[1.0.16861.0]#

Date 7/16/2020

Summary#

This release has some bug fixes and some functional and performance improvements to support compilation of several neural networks.

New in this release#

This release

Resolved Issues#

Other Notes#

[1.0.15275.0]#

Date 6/11/2020

Summary#

This release has some bug fixes and some functional and performance improvements to support compilation of several neural networks.

New in this release#

This release

Resolved Issues#

Other Notes#

Dependencies#

dmlc_nnvm==1.0.2574.0 dmlc_topi==1.0.2574.0 dmlc_tvm==1.0.2574.0 inferentia_hwm==1.0.1362.0 islpy==2018.2

[1.0.12696.0]#

Date 5/11/2020

Summary#

Bug fixes and some functional and performance improvements to several neural networks.

New in this release#

Resolved Issues#

Other Notes#

Dependencies#

dmlc_nnvm==1.0.2356.0 dmlc_topi==1.0.2356.0 dmlc_tvm==1.0.2356.0 inferentia_hwm==1.0.1294.0 islpy==2018.2

[1.0.9410.0]#

Date 3/26/2020

Summary#

Bug fixes and some functional and performance improvements to several neural networks.

New in this release#

Resolved Issues#

Known issues and limitations#

Other Notes#

Dependencies#

dmlc_nnvm==1.0.2049.0 dmlc_topi==1.0.2049.0 pip install --upgrade dmlc_tvm==1.0.2049.0 inferentia_hwm==1.0.897.0 islpy==2018.2

[1.0.7878.0]#

Date 2/27/2020

Summary#

Bug fixes and minor performance improvements.

New in this release#

None

Resolved Issues#

Known issues and limitations#

Other Notes#

Dependencies#

dmlc_nnvm-1.0.1826.0 dmlc_topi-1.0.1826.0 dmlc_tvm-1.0.1826.0 inferentia_hwm-1.0.897.0 islpy-2018.2

[1.0.6801.0]#

Date 1/27/2020

Summary#

Bug fixes and some performance enhancement related to data movement for BERT-type neural networks.

New in this release#

None

Resolved Issues#

“Internal ERROR: Data race between Op1 'Name1(...) [...]' and Op2 'Name2(...) [...]'”

2020-01-09 13:40:26.002594: E tensorflow/core/framework/op_segment.cc:54] Create kernel failed: Invalid argument: neff is invalid 2020-01-09 13:40:26.002637: E tensorflow/core/common_runtime/executor.cc:642] Executor failed to create kernel. Invalid argument: neff is invalid [[{{node bert/NeuronOp}}]]

Known issues and limitations#

See previous release notes. Some tutorials show use of specific compiler options and flags, these are needed to help provide guidance to the compiler to achieve best performance in specific cases. Please do not use in cases other than as shown in the specific tutorial as results may not be defined. These options should be considered beta and will be removed over time.

Other Notes#

Dependencies#

dmlc_nnvm-1.0.1619.0 dmlc_topi-1.0.1619.0 dmlc_tvm-1.0.1619.0 inferentia_hwm-1.0.839.0 islpy-2018.2

[1.0.5939.0]#

Date 12/20/2019

Summary#

Bug fixes and some performance enhancement for NeuronCore Pipeline.

New in this release#

Resolved Issues#

Known issues and limitations#

See previous release notes. Some tutorials show use of specific compiler options and flags, these are needed to help provide guidance to the compiler to achieve best performance in specific cases. Please do not use in cases other than as shown in the specific tutorial as results may not be defined. These options should be considered beta and will be removed over time.

Other Notes#

Dependencies#

[1.0.5301.0]#

Date 12/1/2019

Summary#

New in this release#

Resolved Issues#

Known Issues and Limitations#

See previous release notes. Resolved issues are shown in Resolved Issues.

Other Notes#

Please install g++ on AMIs without g++ pre-installed (i.e. server AMIs):

Ubuntu

sudo apt-get install -y g++

Amazon Linux

sudo yum nstall -y gcc-c++

Supported Python versions:

Supported Linux distributions:

Dependencies#

[1.0.4680.0]#

Date: 11/25/2019

New in this release#

N/A, this is the first release.

Resolved issues#

N/A, this is the first release.

Known issues and limitations#

  1. Control flow Inferentia has a limited support for control flow. In general, Neuron can only support control flow operators which are static at compile time, i.e. static length RNN, top-k, sort, …
  2. Size of neural network The size of neural network is influenced by a) type of neural network (CNN, LSTM, MLP) , b) number of layers, c) sizes of input (dimension of the tensors, batch size, …). The current Neuron compiler release has a limitation in terms of the size of neural network it could effectively optimize. As a result, we limit CNN models (e.g. ResNet) to have an input size of up to 480x480 FP16, batch size of 4; LSTM models (e.g. GNMT) are limited to a time step limit of up to 900; MLP models (like BERT) are limited up to sequence-length equal 128, batch=8.
  3. Data layout The Neuron compiler supports multiple data layout formats (NCHW, NHWC, …). Non-CNHW input/output data-layouts will require Neuron to insert additional transpose operations, causing a degradation in performance.
  4. Object detection models Computer-vision object detection and segmentation models are not supported by the current release.
  5. Reduce data type INT8 data type is not currently supported by the Neuron compiler.
  6. Tensor residency When a sub-graph that is executed on the host is communicating with a sub-graph that is executing on Neuron cores, tensors are copied via the communication queues between the host and Inferentia memory for each inference, which may result in end-to-end performance degradation.
  7. Primary inputs in NeuronCore Pipeline mode When a neural network is executed in NeuronCore Pipeline mode, only the first operator in a neural network can receive primary inputs from the host.

Other Notes#

Dependencies#

This document is relevant for: Inf1