nki.isa.activation_reduce — AWS Neuron Documentation (original) (raw)

This document is relevant for: Inf2, Trn1, Trn2

nki.isa.activation_reduce#

nki.isa.activation_reduce(op, data, *, reduce_op, reduce_res, bias=None, scale=1.0, mask=None, dtype=None, **kwargs)[source]#

Perform the same computation as nisa.activation and also a reduction along the free dimension of thenisa.activation result using Scalar Engine. The results for the reduction is stored in the reduce_res.

This API is equivalent to calling nisa.activation withreduce_cmd=nisa.reduce_cmd.reset_reduce and passing in reduce_res. This API is kept for backward compatibility, we recommend using nisa.activation moving forward.

Refer to nisa.activation for semantics of op/data/bias/scale.

In addition to nisa.activation computation, this API also performs a reduction along the free dimension(s) of the nisa.activation result, at a small additional performance cost. The reduction result is returned in reduce_res in-place, which must be a SBUF/PSUM tile with the same partition axis size as the input tile data and one element per partition. On NeuronCore-v2, the reduce_op can only be an addition, np.add or nl.add.

There are 128 registers on the scalar engine for storing reduction results, corresponding to the 128 partitions of the input. These registers are shared between activation and activation_accu calls. This instruction first resets those registers to zero, performs the reduction on the value after activation function is applied, stores the results into the registers, then reads out the reduction results from the register, eventually store them into reduce_res.

Note that nisa.activation can also change the state of the register. It’s user’s responsibility to ensure correct ordering. It’s the best practice to not mixing the use of activation_reduce and activation.

Reduction axis is not configurable in this API. If the input tile has multiple free axis, the API will reduce across all of them.

Mathematically, this API performs the following computation:

\[\begin{split}output = f_{act}(data * scale + bias) \\ reduce\_res = reduce\_op(output, axis=)\end{split}\]

Estimated instruction cost:

max(MIN_II, N) + MIN_II Scalar Engine cycles, where

Parameters:

Returns:

output tile of the activation instruction; layout: same as input data tile

This document is relevant for: Inf2, Trn1, Trn2