nvmath.fft.compile_epilog — NVIDIA nvmath-python (original) (raw)

nvmath.fft.compile_epilog(

epilog_fn,

element_dtype,

user_info_dtype,

*,

compute_capability=None,

)[source]#

Compile a Python function to LTO-IR to provide as an epilog function forfft() and plan().

Parameters:

Returns:

The function compiled to LTO-IR as bytes object.

Examples

The cuFFT library expects the end user to manage scaling of the outputs, so in order to replicate the norm option found in other Python FFT libraries we can define an epilog which performs the scaling.

import cupy as cp import nvmath import math

Create the data for a batched 1-D FFT.

B, N = 256, 1024 a = cp.random.rand(B, N, dtype=cp.float64) + 1j * cp.random.rand(B, N, dtype=cp.float64)

Compute a normalization factor that will create unitary transforms.

norm_factor = 1.0 / math.sqrt(N)

Define the epilog function for the FFT.

def rescale(data_out, offset, data, user_info, unused): ... data_out[offset] = data * norm_factor

Compile the epilog to LTO-IR. In a system with GPUs that have different compute capability, the compute_capability option must be specified to thecompile_prolog or compile_epilog helpers. Alternatively, the epilog can be compiled in the context of the device where the FFT to which the epilog is provided is executed. In this case we use the current device context, where the operands have been created.

with cp.cuda.Device(): ... epilog = nvmath.fft.compile_epilog(rescale, "complex128", "complex128")

Perform the forward FFT, applying the rescaling as a epilog.

r = nvmath.fft.fft(a, axes=[-1], epilog=dict(ltoir=epilog))

Test that the fused FFT run result matches the result of other libraries.

s = cp.fft.fftn(a, axes=[-1], norm="ortho") assert cp.allclose(r, s)

Notes