How Shared GPU Memory Manager Improves Performance of Generated MEX - MATLAB & Simulink (original) (raw)
You can use the GPU memory manager for efficient memory allocation, management, and improving run-time performance. The GPU memory manager creates a collection of large GPU memory pools and manages allocation and deallocation of chunks of memory blocks within these pools. By creating large memory pools, the memory manager reduces the number of calls to the CUDA® memory APIs, improving run-time performance. SeeGPU Memory Allocation and Minimization.
In particular, when you generate CUDA MEX code, GPU Coder™ creates a single universal memory manager that handles the memory management for all running CUDA MEX functions, thereby further improving the performance of the MEX functions. To view the shared MEX memory manager properties and manage allocation, create agpucoder.MemoryManager
object by using the cudaMemoryManager function. To free the GPU memory that is not in use, call thefreeUnusedMemory
function. This topic explains the working of this shared memory manager with the help of an example.
Obtain Fog Rectification Example Files
This example uses the design file fog_rectification.m
and the image file foggyInput.png
of the Fog Rectification example. To create a folder that contains these files, run this command.
openExample('gpucoder/FogRectificationGPUExample')
Generate and Profile CUDA MEX with GPU Memory Manager Disabled
Create a GPU code configuration object for generating a MEX function. To generate code that does not use the memory manager, set the EnableMemoryManager
property tofalse
.
cfg = coder.gpuConfig("mex"); cfg.GpuConfig.EnableMemoryManager = false;
Generate and profile CUDA MEX code for the design file fog_rectification.m
using thegpuPerformanceAnalyzer function. Specify the input type using an example valueinputImage
, which is the variable into which you loaded thefoggyInput.png
image file. Run the GPU Performance Analyzer with the default iteration count of2
.
inputImage = imread("foggyInput.png"); gpuPerformanceAnalyzer("fog_rectification",{inputImage},Config=cfg);
In the Performance Analyzer report, observe that a significant portion of the execution time is spent on memory allocation and deallocation.
Generate and Profile CUDA MEX with GPU Memory Manager Enabled
Enable GPU memory manager. Then, generate and profile the CUDA MEX function again.
cfg.GpuConfig.EnableMemoryManager = true; gpuPerformanceAnalyzer("fog_rectification",{inputImage},Config=cfg);
Observe that most memory allocation and deallocation events have disappeared from the profiling report. Therefore, the generated MEX now has improved run-time performance.
Shared Memory Manager Allocations and Deallocations
To see when the shared GPU memory manager allocates large GPU memory pools, select the first run of fog_rectification_mex
in the profiling report.
Observe that, compared to the second run, the first run has three extra GPU memory allocation events in the timeline graph. These events correspond to the allocation of three memory pools by the shared GPU memory manager. Subsequent runs offog_rectification_mex
reuse the memory pools allocated in the first run, thereby improving the run-time performance.
For MEX code generation, the memory pools allocated forfog_rectification_mex
are preserved afterfog_rectification_mex
finishes its first execution. This allows subsequent MEX functions to reuse the memory pools allocated forfog_rectification_mex
. However, for standalone CUDA code generation, the memory pools are private to the target (executable or static/dynamic library) and are deallocated when the standalone target is unloaded from the memory.