nki.profile — AWS Neuron Documentation (original) (raw)

This document is relevant for: Inf2, Trn1, Trn2

nki.profile#

nki.profile(func=None, **kwargs)[source]#

Profile a NKI kernel on a NeuronDevice by using nki.profile as a decorator.

Note

Similar to nki.baremetal, The decorated function using nki.benchmark expectsnumpy.ndarray as input/output tensors instead of ML framework tensor objects.

Parameters:

Returns:

None

Listing 13 An Example#

from neuronxcc import nki import neuronxcc.nki.language as nl

@nki.profile(working_directory="/home/ubuntu/profiles", save_neff_name='file.neff', save_trace_name='profile.ntff') def nki_tensor_tensor_add(a_tensor, b_tensor): c_tensor = nl.ndarray(a_tensor.shape, dtype=a_tensor.dtype, buffer=nl.shared_hbm)

a = nl.load(a_tensor) b = nl.load(b_tensor)

c = a + b

nl.store(c_tensor, c)

return c_tensor

nki.profile will save file.neff, profile.ntff, along with json files containing a profile summary inside of the working_directory.

See Profiling NKI kernels with Neuron Profilefor more information on how to visualize the execution trace for profiling purposes.

In addition, more information about neuron-profile can be found in itsdocumentation.

Note

nki.profile does not use the actual inputs passed into the profiled function when running the neff file. For instance, in the above example, the output c tensor is undefined and should not be used for numerical accuracy checks. The input tensors are used mainly to specify the shape of inputs.

This document is relevant for: Inf2, Trn1, Trn2