nki.isa.tensor_copy — AWS Neuron Documentation (original) (raw)

This document is relevant for: Inf2, Trn1, Trn2

nki.isa.tensor_copy#

nki.isa.tensor_copy(src, *, mask=None, dtype=None, engine=engine.unknown, **kwargs)[source]#

Create a copy of src tile within NeuronCore on-chip SRAMs using Vector, Scalar or GpSimd Engine.

The output tile has the same partition axis size and also the same number of elements per partition as the input tile src.

All three compute engines, Vector, Scalar and GpSimd Engine can perform tensor copy. However, their copy behavior is slightly different across engines:

In addition, since GpSimd Engine cannot access PSUM in NeuronCore, Scalar or Vector Engine must be chosen when the input or output tile is in PSUM (see NeuronCore-v2 Compute Engines for details). By default, this API returns a tile in SBUF, unless the returned value is assigned to a pre-declared PSUM tile.

Estimated instruction cost:

max(MIN_II, N) engine cycles, where N is the number of elements per partition in the input tile, and MIN_II is the minimum instruction initiation interval for small input tiles.MIN_II is roughly 64 engine cycles.

Parameters:

Returns:

a tile with the same content and partition axis size as the src tile.

Example:

import neuronxcc.nki.isa as nisa import neuronxcc.nki.language as nl ...

############################################################################

Example 1: Copy over the tensor to another tensor using the Vector engine.

############################################################################ x = nl.load(in_tensor) x_copy = nisa.tensor_copy(x, engine=nisa.vector_engine) nl.store(out_tensor, value=x_copy)

This document is relevant for: Inf2, Trn1, Trn2