coder.GpuCodeConfig - Configuration parameters for CUDA code generation from MATLAB code by using GPU Coder - MATLAB (original) (raw)

coder.gpuConfig

Configuration parameters for CUDA code generation from MATLAB code by using GPU Coder

Description

The coder.GpuCodeConfig orcoder.gpuConfig object contains the configuration parameters thatcodegen uses for generating CUDA® MEX, a static library, a dynamically linked library, or an executable program with GPU Coder™. Pass the object to the codegen function by using the-config option.

Creation

Syntax

Description

cfg = coder.gpuConfig([build_type](#d126e3745)) creates a code generation configuration object for the specified build type, which can be CUDA MEX, a static library, a dynamically linked library, or an executable program. If the Embedded Coder® product is installed, it creates a coder.EmbeddedCodeConfig object for static library, dynamic library, or executable build types.

example

cfg = coder.gpuConfig([build_type](#d126e3745),'ecoder',false) creates a code generation configuration object to generate CUDA'lib', 'dll', or 'exe' output even if the Embedded Coder product is installed.

cfg = coder.gpuConfig([build_type](#d126e3745),'ecoder',true) creates a coder.EmbeddedCodeConfig configuration object even if the Embedded Coder product is not installed. However, code generation using acoder.EmbeddedCodeConfig object requires an Embedded Coder license.

Input Arguments

expand all

build_type — Output to build from generated CUDA code

'MEX' | 'LIB' | 'DLL' | 'EXE'

Output to build from generated CUDA code, specified as one of the values in this table.

Value Description
'MEX' CUDA MEX
'LIB' Static library
'DLL' Dynamically linked library
'EXE' Executable program

Properties

expand all

coder.GpuConfig contains only GPU specific configuration parameters of the code configuration object. To see the properties of the code configuration object, see coder.CodeConfig and coder.EmbeddedCodeConfig.

Enabled — Control GPU code generation

true (default) | false

Control generation of CUDA (*.cu) files by using one of the values in this table.

Value Description
true This value is the default value.Enables CUDA code generation.
false Disables CUDA code generation.

Example: cfg.GpuConfig.Enabled = true

MallocMode — GPU memory allocation

'discrete' (default) | 'unified'

Memory allocation (malloc) mode to be used in the generated CUDA code, specified as one of the values in this table.

Value Description
'discrete' This value is the default value.The generated code uses thecudaMalloc API for transferring data between the CPU and the GPU. From the programmers point-of-view, the discrete mode has a traditional memory architecture with separate CPU and GPU global memory address space.
'unified' The generated code uses thecudaMallocManaged API that uses a shared (unified) CPU and GPU global memory address space.For NVIDIA® embedded targets only. See unified memory allocation mode on host being removed.

For more information, see Discrete and Managed Modes.

Example: cfg.GpuConfig.MallocMode = 'discrete'

KernelNamePrefix — Custom kernel name prefixes

' ' (default) | character vector

Specify a custom name prefix for all the kernels in the generated code. For example, using the value'CUDA_' creates kernels with namesCUDA_kernel1, CUDA_kernel2, and so on. If no name is provided, GPU Coder prepends the kernel name with the name of the entry-point function. Kernel names can contain upper-case letters, lowercase letters, digits 0–9, and underscore character _. GPU Coder removes unsupported characters from the kernel names and appends alpha to prefixes that do not begin with an alphabetic letter.

Example: cfg.GpuConfig.KernelNamePrefix = 'myKernel'

EnableCUBLAS — Use cuBLAS library

true (default) | false

Replacement of math function calls with NVIDIAcuBLAS library calls, specified as one of the values in this table.

Value Description
true This value is the default value.Allows GPU Coder to replace corresponding math function calls with calls to the cuBLAS library. For functions that have no replacements in CUDA, GPU Coder uses portable MATLAB® functions and attempts to map them to the GPU.
false Disable the use of thecuBLAS library in the generated code.

For more information, see Kernels from Library Calls.

Example: cfg.GpuConfig.EnableCUBLAS = true

EnableCUSOLVER — Use cuSOLVER library

true (default) | false

Replacement of math function calls with NVIDIAcuSOLVER library calls, specified as one of the values in this table.

Value Description
true This value is the default value.Allows GPU Coder to replace corresponding math function calls with calls to the cuSOLVER library. For functions that have no replacements in CUDA, GPU Coder uses portable MATLAB functions and attempts to map them to the GPU.
false Disable the use of thecuSOLVER library in the generated code.

For more information, see Kernels from Library Calls.

Example: cfg.GpuConfig.EnableCUSOLVER = true

EnableCUFFT — Use cuFFT library

true (default) | false

Replacement of fft function calls with NVIDIAcuFFT library calls, specified as one of the values in this table.

Value Description
true This value is the default value.Allows GPU Coder to replace appropriatefft calls with calls to thecuFFT library.
false Disables use of the cuFFT library in the generated code. With this option, GPU Coder uses C FFTW libraries where available or generates kernels from portable MATLABfft code.

For more information, see Kernels from Library Calls.

Example: cfg.GpuConfig.EnableCUFFT = true

Benchmarking — Add benchmarking to the generated code

false (default) | true

Control addition of benchmarking code to the generated CUDA code by using one of the values in this table.

Value Description
false This value is the default value.The generated CUDA code does not contain benchmarking functionality.
true Generates CUDA code with benchmarking functionality. This option uses CUDA APIs such ascudaEvent to timekernel,memcpy, and other events.

After execution, the generated benchmarking code creates thegpuTimingData comma separated values (CSV) file in the current working folder. The CSV file contains timing data for kernel, memory, and other events. The table describes the format of the CSV file.

Event Type Format
CUDA kernels <name_N>,,,,N is the Nth execution of the kernel. represents the total block dimension. For example is block dimension isdim3(32,32,32), then the value is 32768.
CUDA memory copy <name_N>,,,,N is the Nth execution of the memory copy.
Miscellaneous <name_N>,,N is the Nth execution of the operation.

Example: cfg.GpuConfig.Benchmarking = true

SafeBuild — Error checking in the generated code

false (default) | true

Add error-checking functionality to the generated CUDA code by using one of the values in this table.

Value Description
false This value is the default value.The generated CUDA code does not contain error-checking functionality.
true Generates code with error-checking for CUDA API and kernel calls.

Example: cfg.GpuConfig.SafeBuild = true

ComputeCapability — Minimum compute capability for code generation

'Auto' (default) | '3.2' | '3.5' | '3.7' | '5.0' | '5.2' | '5.3' | '6.0' | '6.1' | '6.2' | '7.0' | '7.2' | '7.5' | '8.0' | '8.6' | '8.7' | '8.9' | '9.0'

ComputeCapability specifies the minimum compute capability of an NVIDIA GPU device for which CUDA code is generated. CUDA compute capability is a numerical representation of the capabilities and features provided by a GPU architecture for executing CUDA code. The compute capability version is denoted by a major and minor version number and determines the available hardware features, instruction sets, memory capabilities, and other GPU-specific functionalities that can be utilized by CUDA programs. It also affects the compatibility and performance of CUDA code on different GPUs.

For example, a GPU with compute capability 7.0 will have more features and capabilities compared to a GPU with compute capability 3.2. Newer compute capabilities generally introduce enhancements, improved performance, and additional features, allowing you to take advantage of the latest GPU architecture advancements. Certain CUDA features might may have specific compute capability requirements. To see the CUDA compute capability requirements for code generation, consult the following table.

Target Compute Capability
CUDA MEX See GPU Computing Requirements.
Source code, static or dynamic library, and executables 3.2 or higher.
Deep learning applications in 8-bit integer precision 6.1, 6.3 or higher.
Deep learning applications in half-precision (16-bit floating point) 5.3, 6.0, 6.2 or higher.

If you specify custom compute capability, GPU Coder ignores this setting.

When ComputeCapability is set to'Auto', the software uses the compute capability of the GPU device that you select for GPU code generation. If no GPU device is available or if the software is unable to detect a GPU device, the code generator uses a compute capability value of 5.0.

Example: cfg.GpuConfig.ComputeCapability = '6.1'

CustomComputeCapability — Control GPU code generation

'' (default) | character vector

Specify the name of the NVIDIA virtual GPU architecture for which the CUDA input files must be compiled.

For example, to specify a virtual architecture type-arch=compute_50. You can specify a real architecture using -arch=sm_50. For more information, see the_Options for Steering GPU Code Generation_ topic in the CUDA Toolkit documentation.

Example: cfg.GpuConfig.CustomComputeCapability = '-arch=compute_50'

CompilerFlags — Additional flags to the GPU compiler

'' (default) | character vector

Pass additional flags to the GPU compiler. For example,--fmad=false instructs the nvcc compiler to disable contraction of floating-point multiply and add to a single Floating-Point Multiply-Add (FMAD) instruction.

For similar NVIDIA compiler options, see the topic on NVCC Command Options in the CUDA Toolkit documentation.

Example: cfg.GpuConfig.CompilerFlags = '--fmad=false'

StackLimitPerThread — Stack limit per GPU thread

1024 (default) | integer

Specify the maximum stack limit per GPU thread as an integer value.

Example: cfg.GpuConfig.StackLimitPerThread = 1024

MallocThreshold — Malloc threshold

200 (default) | integer

Specify the size above which the private variables are allocated on the heap instead of the stack, as an integer value.

Example: cfg.GpuConfig.MallocThreshold = 256

MaximumBlocksPerKernel — Maximum number of blocks created during a kernel launch

0 (default) | integer

Specify the maximum number of blocks created during a kernel launch.

Because GPU devices have limited streaming multiprocessor (SM) resources, limiting the number of blocks for each kernel can avoid performance losses from scheduling, loading and unloading of blocks.

If the number of iterations in a loop is greater than the maximum number of blocks per kernel, the code generator creates CUDA kernels with striding.

When you specify the maximum number of blocks for each kernel, the code generator creates 1-D kernels. To force the code generator to create 2-D or 3-D kernels, use the coder.gpu.kernel pragma. Thecoder.gpu.kernel pragma takes precedence over the maximum number of kernels for each block.

Example: cfg.GpuConfig.MaximumBlocksPerKernel = 1024

EnableMemoryManager — Use GPU memory manager

true (default) | false

Select the GPU memory manager for efficient memory allocation, management, and improving run-time performance.

Value Description
true The GPU memory manager creates a collection of large GPU memory pools and manages allocation and deallocation of chunks of memory blocks within these pools. By creating large memory pools, the memory manager reduces the number of calls to the CUDA memory APIs, improving run-time performance. You can use the GPU memory manager for MEX and standalone CUDA code generation.This value is the default value.
false Disable the use GPU memory manager for memory allocation and management.

Example: cfg.GpuConfig.EnableMemoryManager = true

SelectCudaDevice — CUDA device selection

-1 (default) | deviceID

In a multi GPU environment such as NVIDIA Drive platforms, specify the CUDA device to target.

Example: cfg.GpuConfig.SelectCudaDevice = <DeviceID>

Note

SelectCudaDevice can be used with gpuArray only ifgpuDevice andSelectCudaDevice point to the same GPU. IfgpuDevice points to a different GPU, aCUDA_ERROR_INVALID_VALUE runtime error is thrown.

Examples

collapse all

Generate CUDA MEX

Generate CUDA MEX function from a MATLAB function that is suitable for GPU code generation. Also, enable a code generation report.

Write a MATLAB function VecAdd, that performs vector addition of inputs A and B.

function [C] = VecAdd(A,B) %#codegen C = coder.nullcopy(zeros(size(A))); coder.gpu.kernelfun(); C = A + B; end

To generate a MEX function, create a code generation configuration object.

cfg = coder.gpuConfig('mex');

Enable the code generation report.

cfg.GpuConfig.EnableCUBLAS = true; cfg.GenerateReport = true;

Generate a MEX function in the current folder specifying the configuration object using the -config option.

% Generate a MEX function and code generation report codegen -config cfg -args {zeros(512,512,'double'),zeros(512,512,'double')} VecAdd

Limitations

Version History

Introduced in R2017b

expand all

R2024a: GPU memory manager is enabled by default

In previous releases, the default value of theEnableMemoryManager property was false. Now, the default value has changed to true. Therefore, when you generate CUDA code, the GPU memory manager is enabled by default.

Because of this change, once you generate a CUDA MEX with the default configuration setting, you cannot run this MEX on a different GPU. If you want to run the generated MEX on a different GPU, set theEnableMemoryManager property to false before you generate code.

In previous releases, the GPU memory manager provided code configuration parameters to manage the allocation and deallocation of memory blocks in the GPU memory pools. These properties have now been removed.

The removed properties are:

R2024a: Change to default compute capability value in code configuration

The default value of the ComputeCapability property is now'Auto' instead of '3.5'. When compute capability is set to 'Auto', the code generator detects and uses the compute capability of the GPU device that you have selected for GPU code generation. If no GPU device is available or if the code generator is unable to detect a GPU device, the code generator uses a compute capability value of'5.0'.

For Simulink® Coder™, the default compute capability value is now '5.0' instead of '3.5'. To change this default value, modify theCompute capability parameter on the > pane in the Configuration Parameters dialog box. For more information, see Compute capability (Simulink Coder).

R2021a: unified memory allocation mode on host being removed

In a future release, the unified memory allocation (cudaMallocManaged) mode will be removed when targeting NVIDIA GPU devices on the host development computer. You can continue to use unified memory allocation mode when targeting NVIDIA embedded platforms.

When generating CUDA code for the host from MATLAB, set the MallocMode property of thecoder.gpuConfig code configuration object to'discrete'.

See Also

Apps

Functions

Objects

Topics