Neuron Kernel Interface (NKI) release notes — AWS Neuron Documentation (original) (raw)
Contents
- Neuron Kernel Interface (NKI) (Beta) [2.22]
- Neuron Kernel Interface (NKI) (Beta) [2.21]
- Neuron Kernel Interface (NKI) (Beta)
- Neuron Kernel Interface (NKI) (Beta) [2.20]
This document is relevant for: Inf2
, Trn1
, Trn2
Neuron Kernel Interface (NKI) release notes#
Neuron Kernel Interface (NKI) (Beta) [2.22]#
Date: 04/03/2025
- New modules and APIs:
- nki.profile
- nki.isa new APIs:
*tensor_copy_dynamic_dst
*tensor_copy_predicated
*max8
,nc_find_index8
,nc_match_replace8
*nc_stream_shuffle
- nki.language new APIs:
mod
,fmod
,reciprocal
,broadcast_to
,empty_like
- Improvements:
- nki.isa.nc_matmul now supports PE tiling feature
- nki.isa.activation updated to support reduce operation and reduce commands
- nki.isa.engine enum
engine
parameter added to morenki.isa
APIs that support engine selection (ie,tensor_scalar
,tensor_tensor
,memset
)- Documentation for
nki.kernels
have been moved to the Github: https://aws-neuron.github.io/nki-samples. The source code can be viewed at aws-neuron/nki-samples.
* These kernels are still shipped as part of Neuron package inneuronxcc.nki.kernels
module
- Documentation updates:
- Kernels public repository https://aws-neuron.github.io/nki-samples
- Updated profiling guide to use
nki.profile
instead ofnki.benchmark
- NKI ISA Activation functions table now have valid input data ranges listed
- NKI ISA Supported Math operators now have supported engine listed
- Clarify
+=
syntax support/limitation
Neuron Kernel Interface (NKI) (Beta) [2.21]#
Date: 12/16/2024
- New modules and APIs:
- nki.compiler module with Allocation Control and Kernel decorators, see guide for more info.
- nki.isa: new APIs (
activation_reduce
,tensor_partition_reduce
,scalar_tensor_tensor
,tensor_scalar_reduce
,tensor_copy
,tensor_copy_dynamic_src
,dma_copy
), new activation functions(identity
,silu
,silu_dx
), and target query APIs (nc_version
,get_nc_version
). - nki.language: new APIs (
shared_identity_matrix
,tan
,silu
,silu_dx
,left_shift
,right_shift
,ds
,spmd_dim
,nc
). - New datatype:
float8_e5m2
- New kernels (
allocated_fused_self_attn_for_SD_small_head_size
,allocated_fused_rms_norm_qkv
) added, kernels moved to public repository.
- Improvements:
- Semantic analysis checks for nki.isa APIs to validate supported ops, dtypes, and tile shapes.
- Standardized naming conventions with keyword arguments for common optional parameters.
- Transition from function calls to kernel decorators (
jit
,benchmark
,baremetal
,simulate_kernel
).
- Documentation updates:
Neuron Kernel Interface (NKI) (Beta)#
Date: 12/03/2024
- NKI support for Trainium2, including full integration with Neuron Compiler. Users can directly shard NKI kernels across multiple Neuron Cores from an SPMD launch grid. See tutorial for more info. See Trainium2 Architecture Guide for an initial version of the architecture specification (more details to come in future releases).
- New calling convention in NKI kernels, where kernel output tensors are explicitly returned from the kernel instead of pass-by-reference. See any NKI tutorial for code examples.
Neuron Kernel Interface (NKI) (Beta) [2.20]#
Date: 09/16/2024
- This release includes the beta launch of the Neuron Kernel Interface (NKI) (Beta). NKI is a programming interface enabling developers to build optimized compute kernels on top of Trainium and Inferentia. NKI empowers developers to enhance deep learning models with new capabilities, performance optimizations, and scientific innovation. It natively integrates with PyTorch and JAX, providing a Python-based programming environment with Triton-like syntax and tile-level semantics offering a familiar programming experience for developers. Additionally, to enable bare-metal access precisely programming the instructions used by the chip, this release includes a set of NKI APIs (
nki.isa
) that directly emit Neuron Instruction Set Architecture (ISA) instructions in NKI kernels. - In addition to documentation, we’ve included many of the innovative kernels used with-in the neuron-compiler such asmambaand flash attentionas open-source samples in a new nki-samplesGitHub repository. New kernel contributions are welcome via GitHub Pull-Requests as well as feature requests and bug reports as GitHub Issues. For more information see thelatest documentation. Included in this initial beta release is an in-depth getting started,architecture, profiling, and performance guide, along with multiple tutorials,api reference documents, documented known issuesand frequently asked questions.
This document is relevant for: Inf2
, Trn1
, Trn2