nki.isa.tensor_scalar_reduce — AWS Neuron Documentation (original) (raw)

This document is relevant for: Inf2, Trn1, Trn2

nki.isa.tensor_scalar_reduce#

nki.isa.tensor_scalar_reduce(*, data, op0, operand0, reduce_op, reduce_res, reverse0=False, dtype=None, mask=None, **kwargs)[source]#

Perform the same computation as nisa.tensor_scalar with one math operator and also a reduction along the free dimension of the nisa.tensor_scalar result using Vector Engine.

Refer to nisa.tensor_scalar for semantics of data/op0/operand0. Unlike regular nisa.tensor_scalar where two operators are supported, only one operator is supported in this API. Also, op0 can only be arithmetic operation in Supported Math Operators for NKI ISA. Bitvec operators are not supported in this API.

In addition to nisa.tensor_scalar computation, this API also performs a reduction along the free dimension(s) of the nisa.tensor_scalar result, at a small additional performance cost. The reduction result is returned in reduce_res in-place, which must be a SBUF/PSUM tile with the same partition axis size as the input tile data and one element per partition. The reduce_op can be any of nl.add, nl.subtract, nl.multiply, nl.max or nl.min.

Reduction axis is not configurable in this API. If the input tile has multiple free axis, the API will reduce across all of them.

\[\begin{split}result = data operand0 \\ reduce\_res = reduce\_op(dst, axis=)\end{split}\]

Estimated instruction cost:

max(MIN_II, N) + MIN_II Vector Engine cycles, where

Parameters:

Returns:

an output tile of (data <op0> operand0) computation

This document is relevant for: Inf2, Trn1, Trn2