nki.language.store — AWS Neuron Documentation (original) (raw)
This document is relevant for: Inf2
, Trn1
, Trn2
nki.language.store#
nki.language.store(dst, value, *, mask=None, **kwargs)[source]#
Store into a tensor on device memory (HBM) from on-chip memory (SBUF).
See Memory hierarchy for detailed information.
Parameters:
- dst – HBM tensor to store the data into.
- value – An SBUF tile that contains the values to store.
- mask – (optional) a compile-time constant predicate that controls whether/how this instruction is executed (see NKI API Masking for details)
Returns:
import neuronxcc.nki.language as nl
...
store into out_tensor[P, F] that is on HBM
from data_tile[P, F] that is on SBUF
nl.store(out_tensor, data_tile)
Note
Partition dimension size can’t exceed the hardware limitation of nki.language.tile_size.pmax
, see Tile size considerations.
Partition dimension has to be the first dimension in the index tuple of a tile. Therefore, data may need to be split into multiple batches to load/store, for example:
import neuronxcc.nki.language as nl
for i_b in nl.affine_range(4): data_tile = nl.zeros((128, 512), dtype=in_tensor.dtype)
...
store into out_tensor[4, 128, 512] one batch at a time
from data_tile[128, 512]
i_p, i_f = nl.mgrid[0:128, 0:512] nl.store(out_tensor[i_b, i_p, i_f], value=data_tile[i_p, i_f])
Also supports indirect DMA access with dynamic index values:
import neuronxcc.nki.language as nl ...
##################################################################################
Indirect DMA write example 1:
- data_tensor has shape [128 x 512].
- idx_tensor on HBM has shape [64] (with values [0, 2, 4, 6, ...]).
- idx_tensor values read from HBM and stored in SBUF idx_tile.
- data_tile of shape [64 x 512] values written into
HBM data_tensor indexed by values in idx_tile.
################################################################################## i_p = nl.arange(64)[:, None] i_f = nl.arange(512)[None, :] idx_tile = nl.load(idx_tensor[i_p]) # indices have to be in SB
nl.store(data_tensor[idx_tile[i_p, 0], i_f], value=data_tile[0:64, 0:512])
import neuronxcc.nki.isa as nisa import neuronxcc.nki.language as nl ...
#############################################################################################
Indirect DMA write example 2:
- data_tensor has shape [128 x 512].
- idx_tile on SBUF has shape [64 x 1] (with values [[0], [2], [4], ...] generated by iota)
- data_tile of shape [64 x 512] values written into
HBM data_tensor indexed by values in idx_tile.
############################################################################################# idx_expr = 2*nl.arange(64)[:, None] idx_tile = nisa.iota(idx_expr, dtype=np.int32)
nl.store(data_tensor[idx_tile, i_f], value=data_tile[0:64, 0:512])
This document is relevant for: Inf2
, Trn1
, Trn2