nki.isa.tensor_scalar_reduce — AWS Neuron Documentation (original) (raw)
This document is relevant for: Inf2
, Trn1
, Trn2
nki.isa.tensor_scalar_reduce#
nki.isa.tensor_scalar_reduce(*, data, op0, operand0, reduce_op, reduce_res, reverse0=False, dtype=None, mask=None, **kwargs)[source]#
Perform the same computation as nisa.tensor_scalar
with one math operator and also a reduction along the free dimension of the nisa.tensor_scalar
result using Vector Engine.
Refer to nisa.tensor_scalar for semantics of data/op0/operand0
. Unlike regular nisa.tensor_scalar
where two operators are supported, only one operator is supported in this API. Also, op0
can only be arithmetic operation in Supported Math Operators for NKI ISA. Bitvec operators are not supported in this API.
In addition to nisa.tensor_scalar computation, this API also performs a reduction along the free dimension(s) of the nisa.tensor_scalar result, at a small additional performance cost. The reduction result is returned in reduce_res
in-place, which must be a SBUF/PSUM tile with the same partition axis size as the input tile data
and one element per partition. The reduce_op
can be any of nl.add
, nl.subtract
, nl.multiply
, nl.max
or nl.min
.
Reduction axis is not configurable in this API. If the input tile has multiple free axis, the API will reduce across all of them.
\[\begin{split}result = data operand0 \\ reduce\_res = reduce\_op(dst, axis=)\end{split}\]
Estimated instruction cost:
max(MIN_II, N) + MIN_II
Vector Engine cycles, where
N
is the number of elements per partition indata
, andMIN_II
is the minimum instruction initiation interval for small input tiles.MIN_II
is roughly 64 engine cycles.
Parameters:
- data – the input tile
- op0 – the math operator used with operand0 (any arithmetic operator in Supported Math Operators for NKI ISA is allowed)
- operand0 – a scalar constant or a tile of shape
(data.shape[0], 1)
, where data.shape[0] is the partition axis size of the inputdata
tile - reverse0 – (not supported yet) reverse ordering of inputs to
op0
; if false,operand0
is the rhs ofop0
; if true,operand0
is the lhs ofop0
. <– currently not supported yet. - reduce_op – the reduce operation to perform on the free dimension of
data <op0> operand0
- reduce_res – a tile of shape
(data.shape[0], 1)
, where data.shape[0] is the partition axis size of the inputdata
tile. The result ofreduce_op(data <op0> operand0)
is written in-place into the tile. - dtype – (optional) data type to cast the output type to (see Supported Data Types for more information); if not specified, it will default to be the same as the data type of the input tile.
- mask – (optional) a compile-time constant predicate that controls whether/how this instruction is executed (see NKI API Masking for details)
Returns:
an output tile of (data <op0> operand0)
computation
This document is relevant for: Inf2
, Trn1
, Trn2