dspunfold - Generates a multi-threaded MEX file from a MATLAB function - MATLAB (original) (raw)
Generates a multi-threaded MEX file from a MATLAB function
Syntax
Description
dspunfold [file](#buxgz0h-1-file)
generates a multi-threaded MEX file from the entry-point MATLAB® function specified by file
, using the unfolding technology. Unfolding is a technique to improve throughput through parallelization. The multi-threaded MEX file leverages the multicore CPU architecture of the host computer and can improve speed significantly. In addition to the multi-threaded MEX file, the function generates a single-threaded MEX file, a self-diagnostic analyzer function, and the corresponding help files.
dspunfold [options](#buxgz0h-1-options) [file](#buxgz0h-1-file)
generates a multi-threaded MEX file from the entry-point MATLAB function specified by file
, using the function arguments specified by options
.
Note
This function requires a MATLAB Coder™ license.
Input Arguments
Option | Values | Description | Examples |
---|---|---|---|
-args arguments | Cell array | Argument types for the entry-point MATLAB function, specified as a cell array. The cell array accepts numeric elements, the coder.typeof function, and the coder.Constant function.The generated multi-threaded MEX file is specialized to the size, class, and complexity ofarguments. | The number of elements in the cell array must be the same as the number of arguments that the entry-point MATLAB function expects.dspunfold fcn -args {ones(10,1), 5}dspunfold extracts the type (size, class, and complexity) information from the elements in thearguments cell array. fcn is the entry-point MATLAB function.dspunfold fcn -args {coder.typeof(ones(10,1)), coder.typeof(5)}coder.typeof is used to specify the types of thefcn arguments.dspunfold fcn -args {coder.Constant(ones(10,1)), coder.Constant(5)}dspunfold fcn -args {}By default, arguments is {}. An empty cell array {} indicates thatfcn accepts no input arguments. |
-o output | Character vector | Name of the output multi-threaded MEX file, specified as a character vector. If no output name is specified, the name of the generated multi-threaded MEX file is inherited from the input MATLAB function with an '_mt' suffix.dspunfold also adds a platform-specific extension to this name. In addition, dspunfold generates a single-threaded MEX file with an '_st' suffix, and a test bench file with an '_analyzer' suffix. | No output name specifieddspunfold fcnFiles generated: fcn_mt.mexw64,fcn_st.mexw64,fcn_analyzer.poutput name specified dspunfold fcn -o fooFiles generated: foo.mexw64,foo_st.mexw64,foo_analyzer.p |
-s statelength | Scalar integer greater than or equal to zeroauto | State length of the algorithm in the entry-point MATLAB function, specified as a scalar integer greater than or equal to zero, or auto. By default, thestatelength is zero frames, indicating that the algorithm is stateless.If at least one entry offrameinputs is true,statelength is considered in samples. For information on frames and samples, see Sample- and Frame-Based Concepts-s auto triggers automatic state length detection. In this mode, you must provide numeric inputs to the arguments cell array. These inputs detect the state length of the algorithm. You can inputcoder.Constant but not coder.typeof. When automatic state length detection is invoked, it is recommended that you provide random inputs to the arguments array. See Automatic State Length Detection | dspunfold fcn -args {randn(10,1), randn(10,1), randn(10,1)} -s 3 -f [false, false, false]State length is three frames.dspunfold fcn -args {randn(10,1), randn(10,1), randn(10,1)} -s 3 -f [true, false, false]State length is three samples. State length is considered in samples, because at least one entry of the-f option is true. dspunfold fcn -args {randn(10,1), randn(10,1), randn(10,1)} -s autoAutomatic state length detection is invoked.dspunfold fcn -args {coder.typeof (randn(10,1)), coder.typeof(randn(10,1)), coder.typeof(randn(10,1))} -s auto generates this error message: The input argument 1 is of type coder.PrimitiveType which is not supported when using -s auto |
-f frameinputs | scalar logicalvector of logical values | Frame status of input arguments for the entry-point MATLAB function, specified as one of true orfalse. true — Input is in frames and can be subdivided into samples without changing the system behavior.false — Input cannot be subdivided into samples without changing the system behavior. For example, you cannot subdivide the coefficients of a filter without changing the characteristics of the filter.By default, frameinputs isfalse.frameinputs set to a scalar logical value sets the frame status of all the inputs simultaneously.To specify statelength in samples, set at least one entry of frameinputs totrue. If frameinputs is not specified, the unit of statelength is frames. | dspunfold fcn -args {randn(10,1), randn(10,1), randn(10,1)} -s 3 -f trueAll the inputs are marked as frames. State length isthree samples. dspunfold fcn -args {randn(10,1), randn(10,1), randn(10,1)} -s 3 -f [true, false, false]State length is three samples. dspunfold fcn -args {randn(10,1), randn(10,1), randn(10,1)} -s 3The default value of frameinputs isfalse. State length is three frames. |
-r repetition | Positive integer | Repetition factor used to generate the multi-threaded MEX file, specified as a positive integer. The default value ofrepetition is 1. See Repetition Factor. | dspunfold fcn -args {randn(10,2), randn(20,2), randn(30,3)} -r 2 |
-t threads | Positive integer | Number of threads used by the multi-threaded MEX file, specified as a positive integer. The default value of threads is the number of physical CPU cores present on your machine. See Threads. | dspunfold fcn -args {randn(10,1), randn(20,2), randn(30,3)} -t 4 |
-v verbose | Scalar logical | Option to show verbose output during code generation, specified astrue or false. The default istrue. | dspunfold fcn -args {randn(10,1), randn(20,2), randn(30,3)} -v truedspunfold fcn -args {randn(10,1), randn(20,2), randn(30,3)} -v false |
Entry-point MATLAB function from which dspunfold
generates the multi-threaded MEX file. The function must support code generation.
Example: dspunfold fcn -args {randn(10,1),randn(10,2),randn(20,1)}
fcn
is the entry-point MATLAB function and {randn(10,1),randn(10,2),randn(20,1)}
are its input arguments.
Output Files
When you invoke dspunfold
on an entry-point MATLAB function, dspunfold
generates the following files.
File | Value | Description | Examples |
---|---|---|---|
Multi-threaded MEX file | MEX file | Multi-threaded MEX file generated from the entry-point MATLAB function. The MEX file inherits the output name. If no output name is specified, the name of this file is inherited from the MATLAB function with an '_mt' suffix. A platform-specific extension is also added to the name. | dspunfold fcn -o foo generatesfoo.mexw64dspunfold fcn generatesfcn_mt.mexw64 |
Help file for the multi-threaded MEX file | MATLAB file | MATLAB help file for the multi-threaded MEX file. The help file has the same name as the MEX file, but with an '.m' extension. To invoke the help file, typehelp at the MATLAB command prompt. This help file displays information on how to invoke the MEX file, its syntax, latency, and types (size, class, and complexity) of the inputs to the MEX file. In addition, the help file documents the parameters used by dspunfold —Threads, Repetition, and State length. This information is useful when you are invoking the MEX file. The syntax to invoke the MEX file should be the same as the syntax shown in the help file. | help foohelp fcn_mt |
Single-threaded MEX file | MEX file | Single-threaded MEX file generated from the entry-point MATLAB function. The MEX file inherits the output name with an '_st' suffix. If no output name is specified, the name of this file is inherited from the MATLAB function with an '_st' suffix. A platform-specific extension is also added to the name. Use this file as a benchmark to compare against the speed of the multi-threaded MEX file. | dspunfold fcn -o foo generatesfoo_st.mexw64dspunfold fcn generatesfcn_st.mexw64 |
Help file for the single-threaded MEX file | MATLAB file | MATLAB help file for the single-threaded MEX file. The help file has the same name as the MEX file, but with an '.m' extension. To invoke the help file, typehelp at the MATLAB command prompt. The help file displays information on how to invoke the MEX file, its syntax, and types (size, class, and complexity) of the inputs to the MEX file. The syntax to invoke the MEX file should be the same as the syntax shown in the help file. | help foo_sthelp fcn_st |
Self-diagnostic analyzer function | P-coded file | report = function_analyzer (input 1, input 2,...input n) measures the difference in speed between the multi-threaded MEX file and the single-threaded MEX file. This file verifies that the output values match.report = function_analyzer('latency') reports the latency of the multi-threaded MEX file introduced by unfolding.report contains the following fields: Latency — The value of the latency (in frames)Speedup — The speedup difference between the multi-threaded MEX file and single-threaded MEX file. If you specified latency option, the value of this field is empty [].Pass — Logical value that shows if the outputs match between the generated multi-threaded MEX file and the single-threaded MEX file. If you specified latency option, the value of this field is empty[].The first dimension of the analyzer inputs must be a multiple of the first dimension of the corresponding inputs, given to the-args option. The other dimensions must match exactly.The analyzer inherits the output name with an'_analyzer' suffix. If no output name is specified, the name of this file is inherited from the MATLAB function with an '_analyzer' suffix. | Multiple frames with different values are specified along the first dimensionExample 1: report = foo_analyzer(randn(10*2,1), randn(20*2,2), randn(30*3,3))Example 2: report = foo_analyzer([randn(10,1);randn(10,1)],[randn(20,1);randn(20,1)],[randn(30,1);randn(30,1);randn(30,1)])report = foo_analyzer('latency') |
Help file for the self-diagnostic analyzer function | MATLAB file | Help file for the self-diagnostic analyzer function. The help file has the same name as the MEX file, but with an '.m' extension. To invoke the help file, typehelp <function_analyzer> in MATLAB. The help file for the self-diagnostic analyzer function displays information on how to invoke the analyzer function, its syntax, and types (size, class, and complexity) of the inputs to the analyzer function. The syntax to invoke the analyzer function should be the same as the syntax shown in the help file. | help foo_analyzer |
Limitations
General Limitations:
- On Windows and Linux, you must use a compiler that supports the Open Multiprocessing (OpenMP) application interface. See Supported Compilers.
- If you have a macOS with an Xcode version 12.0 or later, using the
dspunfold
function is not supported. - If the input MATLAB function has runtime errors, the errors are not caught when you run the multi-threaded MEX file. Before you use the
dspunfold
function, callcodegen
on the MATLAB function and make sure that the MEX file is generated successfully. - If the generated code uses a large amount of memory to store the local variables, around
4
MB
on Windows platform, the generated multi-threaded MEX file can have unexpected behavior. This limit varies with each platform. As a workaround, reduce the size of the input signals or restructure the MATLAB function to use less local memory. dspunfold
does not support:- varargin and varargout inside the MATLAB function
- Variable-size inputs and outputs
- Input signals with an arbitrary frame length to System objects that use the
DecimationFactor
property. The input signal is considered to have an arbitrary frame length when its frame length is not a multiple of the decimation factor. When this is the case, the output of the object in the generated code is a variable-size signal, anddspunfold
does not support variable-size output signals.
In the case of the dsp.FarrowRateConverter object, you can determine the decimation factor using the getRateChangeFactors function. - P-coded entry-point MATLAB functions
- Cell arrays as inputs and outputs
Analyzer Limitations:
The following limitations apply to the analyzer function generated by thedspunfold
function. For more information on the analyzer function, see 'Self-Diagnostic Analyzer’ in the 'More About' section of dspunfold.
- If multiple frames of the analyzer input are identical, the analyzer might throw false positive pass results. It is recommended that you provide at least two different frames for each input of the analyzer.
- If the algorithm in the entry-point MATLAB function chooses its state length based on the input values, the analyzer might provide different pass results for different input values. For an example, see the
FIR_Mean
function in Why Does the Analyzer Choose the Wrong State Length?. - If the input to the entry-point MATLAB function does affect the output immediately, the analyzer might throw false positive pass results. For an example, see the
Input_Output
function in Why Does the Analyzer Choose a Zero State Length?. - If the output results of the multi-threaded MEX file and single-threaded MEX file match statistically but do not match numerically, the analyzer does not pass. Consider the
FilterNoise
function that follows, which filters a random noise signal with an FIR filter. The function callsrandn
from within itself to generate random noise. Hence, the output results of theFilterNoise
function match statistically but not match numerically.
function Output = FilterNoise(x)
persistent FIRFilter
if isempty(FIRFilter)
FIRFilter = dsp.FIRFilter('Numerator',fir1(12,0.4));
end
Output = FIRFilter(x+randn(1000,1));
end
When you run the automatic state length detection tool run onFilterNoise
, the tool detects an infinite state length. Because the tool cannot find a numerical match for a finite state length, it chooses an infinite state length.
dspunfold FilterNoise -args {randn(1000,1)} -s auto
Analyzing input MATLAB function FilterNoise
Creating single-threaded MEX file FilterNoise_st.mexw64
Searching for minimal state length (this might take a while)
Checking stateless ... Insufficient
Checking 1 ... Insufficient
Checking Infinite ... Sufficient
Checking 2 ... Insufficient
Minimal state length is Inf
Creating multi-threaded MEX file FilterNoise_mt.mexw64
Warning: The multi-threading was disabled due to performance considerations.
This happens when the state length is greater than or
equal to (Threads-1)*Repetition frames (3 frames in this case).In coder.internal.warning (line 8)
In unfoldingEngine/BuildParallelSolution (line 25)
In unfoldingEngine/generate (line 207)
In dspunfold (line 234)
Creating analyzer file FilterNoise_analyzer
The algorithm does not need an infinite state. The state length of the FIR filter, hence the algorithm is12
.
Calldspunfold
with state length set to 12.
dspunfold FilterNoise -args {randn(1000,1)} -s 12 -f true
Analyzing input MATLAB function FilterNoise
Creating single-threaded MEX file FilterNoise_st.mexw64
Creating multi-threaded MEX file FilterNoise_mt.mexw64
Creating analyzer file FilterNoise_analyzer
Run the analyzer function.
FilterNoise_analyzer(randn(1000*4,1))
Analyzing multi-threaded MEX file FilterNoise_mt.mexw64 ...
Latency = 8 frames
Speedup = 0.5x
Warning: The output results of the multi-threaded MEX file FilterNoise_mt.mexw64 do not
match the output results of the single-threaded MEX file FilterNoise_st.mexw64. Check that
you provided the correct state length value to the dspunfold function when you generated the
multi-threaded MEX file FilterNoise_mt.mexw64. For best practices and possible solutions to
this problem, see the 'Tips' section in the dspunfold function reference page.
In coder.internal.warning (line 8)
In FilterNoise_analyzer
ans =
Latency: 8
Speedup: 0.4970
Pass: 0
The analyzer looks for a numerical match and fails the verification, even though the generated multi-threaded MEX file is valid.
Speedup Limitations:
- If the entry-point MATLAB function contains code with low complexity, MATLAB overhead or multi-threaded MEX overhead overshadow any performance gains. In such cases, do not use
dspunfold
. - If the number of operations in the input MATLAB function is small compared to the size of the input or output data, the multi-threaded MEX file does not provide any speedup gain. Sometimes, it can result in a speedup loss, even if the repetition value is increased. In such cases, do not use
dspunfold
.
More About
State length of the algorithm.
Most of the time, the state length used by dspunfold
matches the state length of the algorithm in the entry-point MATLAB function. If the algorithm is simple, state length is easy to determine. For example, the state length of an FIR filter is the number of taps in the filter –1
. In some scenarios, to optimize speedup, dspunfold
chooses a state length that is different from the algorithm state length or the state length specified using the -s
option. For example, when the state length is greater than (threads – 1
) ×repetition frames, dspunfold
considers the state length to be infinite. Also, multi-threading gets disabled due to performance considerations.
You can automatically detect the minimum state length for which the outputs of the multi-threaded MEX and single-threaded MEX match.
In complex algorithms, it is not easy to determine the state length analytically. In such scenarios, use the analyzer to compute the state length. When you set-s
to auto
, dspunfold
invokes the analyzer. The analyzer computes the outputs for different state lengths and detects the minimum state length for which the outputs of the multi-threaded MEX file and single-threaded MEX file match. The analyzer uses the numeric value of the inputs given to-args
. To detect the most efficient state length, provide random inputs to -args
. In this mode, you cannot input coder.typeof
to arguments
. Due to the extra analysis this tool requires, the time to generate the MEX file increases.
When you use automatic state length detection on an algorithm with code paths that depend on the input values, use inputs that choose the code path with the longest state length. Also, the inputs must have an immediate effect on the output. If inputs choose a code path that triggers runtime errors, automatic state length detection stops, and so does the analyzer. Make sure that the MATLAB function supports code generation and does not have run-time errors for the inputs under test. Before invoking dspunfold
, callcodegen
on the entry-point MATLAB
function. In addition, simulate the entry-point MATLAB
function to make sure it has no run-time errors.
The -t
option specifies the number of threads used by the multi-threaded MEX file.
Increasing this value can improve the multi-threaded MEX speedup, at the cost of a larger latency. Decreasing this value reduces the latency and potentially decreases the multi-threaded MEX speedup.
Repetition factor is the number of consecutive frames processed by each thread in one processing step.
Increasing this value reduces the overhead per frame of data, potentially improving the speedup at the cost of larger latency. Decreasing this value reduces the latency, and potentially decreases the multi-threaded MEX speedup.
The self-diagnostic analyzer function is a help tool that is generated with the MEX file. This function measures the speedup gain of the multi-threaded MEX file compared to the single-threaded MEX file. The analyzer function also verifies that the outputs of the multi-threaded MEX file and single-threaded MEX file match.
If you specify an incorrect state length value, the outputs usually do not match. To check for the numerical match between the multi-threaded MEX file and the single-threaded MEX file, provide at least two different frames for each input argument of the analyzer. The frames are appended along the first dimension. The analyzer alternates between these frames while verifying that the outputs match. Failure to provide multiple frames for each input can decrease the effectiveness of the analyzer and can lead to false positive verification results. In other words, the analyzer might produce pass =
1
results even when an incorrect state length value is specified. The analyzer alternates through a maximum of 3 × (2
×threads × repetition) frames. If your algorithm requires more than 3 × (2
× threads ×repetition) frames to verify the results, then the analyzer cannot verify accurately.
Tips
General
- Do not display plots, scopes, or execute other user interface operations from within the multi-threaded MEX file. The generated MEX file can have unexpected behavior.
- Do not use
coder.extrinsic
inside the input MATLAB function. The generated MEX file can have unexpected behavior.
When the state length is less than or equal to (threads – 1
) × repetition frames:
- Do not use a random number inside the MATLAB function. The outputs of the single-threaded MEX file and the multi-threaded MEX file might not match. Also, the outputs of the consecutive executions of the multi-threaded MEX file might not match. The analyzer might not pass the numerical match verification.
It is recommended that you generate the random number outside the entry-point MATLAB function and pass it as an argument to the function. - Do not use global or persistent variables anywhere other than in the entry-point MATLAB function. For example, avoid using persistent variables in subfunctions. The generated MEX file can produce inaccurate results. In general, global variables are not recommended.
- Do not access I/O resources from within the multi-threaded MEX file. The generated MEX file can have unexpected behavior. These resources include file writers and readers, UDP sockets, and audio players and recorders.
- Do not use functions with interactive inputs (for example, the keyboard) inside the multi-threaded MEX file. The generated MEX file can have unexpected behavior.
Workflow
- To generate a valid multi-threaded MEX file with the required speedup and latency, follow the Workflow for Generating a Multithreaded MEX File using dspunfold.
- Before using
dspunfold
, callcodegen
on the entry-point MATLAB function and make sure that the function generates a MEX file successfully. - After generating the multi-threaded MEX file using
dspunfold
, run the analyzer function. Make sure that the analyzer function passes. The exception to this rule is when the algorithm produces results that match statistically, but not numerically. In this exception, the analyzer function does notpass
, even though thedspunfold
function generates a valid multi-threaded MEX file. See 'Analyzer Limitations' for an example. - For help on using the MEX file and analyzer, at the MATLAB command prompt, enter
help _`<mexfile name>`_
andhelp _`<analyzer name>`_
.
State Length
- If you choose a state length that is greater than or equal to the value of the exact state length, the analyzer passes. If the analyzer fails, increase the state length, regenerate the MEX file, and verify again.
- If the state length is greater than
0
, the inputs marked as frames (through-f
option) must all have the same dimensions. - When generating the MEX file and running the analyzer, use inputs that invoke the same state length.
Automatic State Length Detection
When you set -s
to auto
:
- If the algorithm in the entry-point MATLAB function chooses a code path based on the input values, use inputs that choose the code path with the longest state length.
- Provide random inputs to
-args
. - Choose inputs that have an immediate effect on the output. See Why Does the Analyzer Choose a Zero State Length?.
Analyzer
- Make sure the outputs of the multi-threaded MEX file and the single-threaded MEX file do not contain
NaN
or anInf
. The analyzer cannot do numeric checks and returnspass
asfalse
. The automatic state length detection tool detects infinite state length and displays a warning
Warning
The output results of the multi-threaded MEX file do not match the output results of the single-threaded MEX file even for Infinite state length. A possible reason is that input MATLAB function generates different output results between consecutive runs even for the same input values. - Provide multiple frames with different values for each input of the analyzer. To improve the analyzer effectiveness, append successive frames along the first dimension.
- Provide inputs to the analyzer that lead to efficient code coverage.
Speedup
- To improve the speedup of the multi-threaded MEX file, specify the exact state length in samples. You can specify the state length in samples by setting at least one entry of
frameinputs
totrue
. The use of samples reduces the overhead and increases the speedup. - To increase the speedup at the cost of larger latency, you can:
- Increase the repetition factor. Use the
-r
option. - Increase the number of threads. Use the
-t
option.
- Increase the repetition factor. Use the
- For each input that can be divided into samples without altering the algorithm behavior, set frame status to
true
using the-f
option. The input is then considered in samples, which can increase the speedup of the generated multi-threaded MEX file.
Algorithms
The multi-threaded MEX file buffers multiple-input signal frames into a buffer of2
× threads × repetition frames, where threads is the number of threads, and_repetition_ is the repetition factor. The MEX file processes these frames simultaneously, using multiple cores. This process introduces some deterministic latency, where latency = 2
× threads × repetition. Latency is traded off with the speedup you might gain by increasing the number of threads or the repetition factor.
Version History
Introduced in R2015b