nki.isa.dma_copy — AWS Neuron Documentation (original) (raw)

This document is relevant for: Inf2, Trn1, Trn2

nki.isa.dma_copy#

nki.isa.dma_copy(*, dst, src, mask=None, dst_rmw_op=None, oob_mode=oob_mode.error, dge_mode=dge_mode.unknown)[source]#

Copy data from src to dst using DMA engine. Both src and dst tiles can be in device memory (HBM) or SBUF. However, if both src and dst tiles are in SBUF, consider usingnisa.tensor_copy instead for better performance.

Parameters:

A cast will happen if the src and dst have different dtype.

Example:

import neuronxcc.nki.isa as nisa

############################################################################

Example 1: Copy over the tensor to another tensor

############################################################################ nisa.dma_copy(dst=b, src=a)

import neuronxcc.nki.isa as nisa import neuronxcc.nki.language as nl from neuronxcc.nki.typing import tensor

############################################################################

Example 2: Load elements from HBM with indirect addressing. If addressing

results out-of-bound access, the operation will fail.

############################################################################

... n, m = in_tensor.shape ix, iy = nl.mgrid[0:n//2, 0:m]

expr_arange = 2*nl.arange(n//2)[:, None] idx_tile: tensor[64, 1] = nisa.iota(expr_arange, dtype=np.int32)

out_tile: tensor[64, 512] = nisa.memset(shape=(n//2, m), value=-1, dtype=in_tensor.dtype) nisa.dma_copy(src=in_tensor[idx_tile, iy], dst=out_tile[ix, iy], oob_mode=nisa.oob_mode.error)

import neuronxcc.nki.isa as nisa import neuronxcc.nki.language as nl from neuronxcc.nki.typing import tensor

############################################################################

Example 3: Load elements from HBM with indirect addressing. If addressing

results in out-of-bounds access, the operation will fail.

############################################################################

... n, m = in_tensor.shape ix, iy = nl.mgrid[0:n//2, 0:m]

indices are out of range on purpose to demonstrate the error

expr_arange = 3*nl.arange(n//2)[:, None] idx_tile: tensor[64, 1] = nisa.iota(expr_arange, dtype=np.int32)

out_tile: tensor[64, 512] = nisa.memset(shape=(n//2, m), value=-1, dtype=in_tensor.dtype) nisa.dma_copy(src=in_tensor[idx_tile, iy], dst=out_tile[ix, iy], oob_mode=nisa.oob_mode.error)

import neuronxcc.nki.isa as nisa import neuronxcc.nki.language as nl from neuronxcc.nki.typing import tensor

############################################################################

Example 4: Load elements from HBM with indirect addressing. If addressing

results in out-of-bounds access, the operation will skip indices.

############################################################################

... n, m = in_tensor.shape ix, iy = nl.mgrid[0:n//2, 0:m]

indices are out of range on purpose

expr_arange = 3*nl.arange(n//2)[:, None] idx_tile: tensor[64, 1] = nisa.iota(expr_arange, dtype=np.int32)

out_tile: tensor[64, 512] = nisa.memset(shape=(n//2, m), value=-1, dtype=in_tensor.dtype) nisa.dma_copy(src=in_tensor[idx_tile, iy], dst=out_tile[ix, iy], oob_mode=nisa.oob_mode.skip)

import neuronxcc.nki.isa as nisa import neuronxcc.nki.language as nl from neuronxcc.nki.typing import tensor

############################################################################

Example 5: Store elements to HBM with indirect addressing and with

read-modifed-write operation.

############################################################################

... n, m = in_tensor.shape ix, iy = nl.mgrid[0:n, 0:m]

expr_arange = 2*nl.arange(n)[:, None] inp_tile: tensor[64, 512] = nl.load(in_tensor[ix, iy]) idx_tile: tensor[64, 1] = nisa.iota(expr_arange, dtype=np.int32)

out_tile: tensor[128, 512] = nisa.memset(shape=(2*n, m), value=1, dtype=in_tensor.dtype) nl.store(out_tensor, value=out_tile) nisa.dma_copy(dst=out_tensor[idx_tile, iy], src=inp_tile, dst_rmw_op=np.add)

import neuronxcc.nki.isa as nisa import neuronxcc.nki.language as nl from neuronxcc.nki.typing import tensor

############################################################################

Example 6: Store elements to HBM with indirect addressing. If indirect

addressing results out-of-bound access, the operation will fail.

############################################################################

... n, m = in_tensor.shape ix, iy = nl.mgrid[0:n, 0:m]

expr_arange = 2*nl.arange(n)[:, None] inp_tile: tensor[64, 512] = nl.load(in_tensor[ix, iy]) idx_tile: tensor[64, 1] = nisa.iota(expr_arange, dtype=np.int32)

out_tile: tensor[128, 512] = nisa.memset(shape=(2*n, m), value=-1, dtype=in_tensor.dtype) nl.store(out_tensor, value=out_tile) nisa.dma_copy(dst=out_tensor[idx_tile, iy], src=inp_tile, oob_mode=nisa.oob_mode.error)

import neuronxcc.nki.isa as nisa import neuronxcc.nki.language as nl from neuronxcc.nki.typing import tensor

############################################################################

Example 7: Store elements to HBM with indirect addressing. If indirect

addressing results out-of-bounds access, the operation will skip indices.

############################################################################

... n, m = in_tensor.shape ix, iy = nl.mgrid[0:n, 0:m]

indices are out of range on purpose to demonstrate the error

expr_arange = 3*nl.arange(n)[:, None] inp_tile: tensor[64, 512] = nl.load(in_tensor[ix, iy]) idx_tile: tensor[64, 1] = nisa.iota(expr_arange, dtype=np.int32)

out_tile: tensor[128, 512] = nisa.memset(shape=(2*n, m), value=-1, dtype=in_tensor.dtype) nl.store(out_tensor, value=out_tile) nisa.dma_copy(dst=out_tensor[idx_tile, iy], src=inp_tile, oob_mode=nisa.oob_mode.error)

############################################################################

Example 8: Store elements to HBM with indirect addressing. If indirect

addressing results out-of-bounds access, the operation will skip indices.

############################################################################

... n, m = in_tensor.shape ix, iy = nl.mgrid[0:n, 0:m]

indices are out of range on purpose

expr_arange = 3*nl.arange(n)[:, None] inp_tile: tensor[64, 512] = nl.load(in_tensor[ix, iy]) idx_tile: tensor[64, 1] = nisa.iota(expr_arange, dtype=np.int32)

out_tile: tensor[128, 512] = nisa.memset(shape=(2*n, m), value=-1, dtype=in_tensor.dtype) nl.store(out_tensor, value=out_tile) nisa.dma_copy(dst=out_tensor[idx_tile, iy], src=inp_tile, oob_mode=nisa.oob_mode.skip)

This document is relevant for: Inf2, Trn1, Trn2