nki.isa.tensor_copy_dynamic_src — AWS Neuron Documentation (original) (raw)
This document is relevant for: Inf2
, Trn1
, Trn2
nki.isa.tensor_copy_dynamic_src#
nki.isa.tensor_copy_dynamic_src(src, *, mask=None, dtype=None, engine=engine.unknown, **kwargs)[source]#
Create a copy of src
tile within NeuronCore on-chip SRAMs using Vector or Scalar or GpSimd Engine, with src
located at a dynamic offset within each partition.
Both source and destination tiles can be in either SBUF or PSUM. By default, this API returns a tile in SBUF, unless the returned value is assigned to a pre-declared PSUM tile.
The source and destination tiles must also have the same number of partitions and the same number of elements per partition.
The dynamic offset must be a scalar value resided in SBUF. If you have a list of dynamic offsets for gathering tiles in SBUF/PSUM, you may loop over each offset and call tensor_copy_dynamic_src
once per offset.
Estimated instruction cost:
max(MIN_II_DYNAMIC, N)
engine cycles, where:
N
is the number of elements per partition in thesrc
tile,MIN_II_DYNAMIC
is the minimum instruction initiation interval for instructions with dynamic source location.MIN_II_DYNAMIC
is roughly 600 engine cycles.
Parameters:
- src – the source of copy, must be a tile in SBUF or PSUM that is dynamically indexed within each partition.
- mask – (optional) a compile-time constant predicate that controls whether/how this instruction is executed (see NKI API Masking for details)
- dtype – (optional) data type to cast the output type to (see Supported Data Types for more information); if not specified, it will default to be the same as the data type of the input tile.
- engine – (optional) the engine to use for the operation: nki.isa.vector_engine, nki.isa.gpsimd_engine,nki.isa.scalar_engine or nki.isa.unknown_engine (default, let compiler select best engine).
- return – the modified destination of copy.
Example:
import neuronxcc.nki.typing as nt import neuronxcc.nki.isa as nisa import neuronxcc.nki.language as nl ...
#########################################################################################
TensorCopyDynamicSrc example 0:
- src_tensor in HBM of shape [128, 512]
- offsets in HBM of shape [1, 64] (with values [4, 5, 6, 7, ...])
- Gather tiles of shape [128, 1] from src_tensor into out_tensor using offsets
#########################################################################################
Load src_tensor and offsets into SBUF
src_tensor_sbuf: nt.tensor[128, 512] = nl.load(src_tensor) offsets_sbuf: nt.tensor[1, 64] = nl.load(offsets)
Copy into output tensor in SBUF
out_sbuf: nt.tensor[128, 64] = nl.ndarray([128, 64], dtype=src_tensor.dtype, buffer=nl.sbuf)
Static indices to access a tile of shape [128, 1];
Add dynamic offsets to iy for tensor_copy_dynamic_src
ix, iy = nl.mgrid[0:128, 0:1]
for idx in nl.affine_range(offsets_sbuf.shape[1]): out_sbuf[ix, idx] = nisa.tensor_copy_dynamic_src( src_tensor_sbuf[ix, offsets_sbuf[0, idx] + iy])
nl.store(out_tensor, value=out_sbuf) ...
import neuronxcc.nki.typing as nt import neuronxcc.nki.isa as nisa import neuronxcc.nki.language as nl ...
#########################################################################################
TensorCopyDynamicSrc example 1:
- src_tensor in HBM of shape [128, 512, 4]
- offsets in HBM of shape [1 x 8] (with values [4, 5, 6, 7, ...]) to index into
second axis of src_tensor
- Gather tiles of shape [128, 4] from src_tensor into out_tensor using offsets
#########################################################################################
Load src_tensor and offsets into SBUF
src_tensor_sbuf: nt.tensor[128, 512, 4] = nl.load(src_tensor) offsets_sbuf: nt.tensor[1, 8] = nl.load(offsets)
Copy into output tensor in SBUF
out_sbuf: nt.tensor[128, 8, 4] = nl.ndarray([128, 8, 4], dtype=src_tensor.dtype, buffer=nl.sbuf)
Static indices to access a tile of shape [128, 1, 4];
Use dynamic offsets directly to index the second axis for tensor_copy_dynamic_src
ix, _, iz = nl.mgrid[0:128, 0:1, 0:4]
for idx in nl.affine_range(offsets.shape[1]): out_sbuf[ix, idx, iz] = nisa.tensor_copy_dynamic_src( src_tensor_sbuf[ix, offsets_sbuf[0, idx], iz])
nl.store(out_tensor, value=out_sbuf) ...
This document is relevant for: Inf2
, Trn1
, Trn2