Generate SIMD Code from MATLAB Functions for Intel Platforms - MATLAB & Simulink (original) (raw)
You can generate single instruction, multiple data (SIMD) code from certain MATLAB® functions by using Intel® SSE and, if you have Embedded Coder®, Intel AVX technology. SIMD is a computing paradigm in which a single instruction processes multiple data. Many modern processors have SIMD instructions that, for example, perform several additions or multiplications at once. For computationally intensive operations among supported functions, SIMD intrinsics can significantly improve the performance of the generated code on Intel platforms.
To generate SIMD code by using the Embedded Coder Support Package for ARM® Cortex®-A Processors, see Generate SIMD Code from MATLAB Functions for ARM Platforms (Embedded Coder).
MATLAB Functions That Support SIMD Code for Intel
When certain conditions are met, you can generate SIMD code by using Intel SSE or Intel AVX technology. The following table lists MATLAB functions that support SIMD code generation. The table also details the conditions under which the support is available.
MATLAB Function | Conditions |
---|---|
plus | For AVX, SSE, and FMA, the input argument has a data type ofsingle or double. For AVX2, SSE4.1, and SSE2, the input argument has a data type ofsingle, double, int8,int16, int32 orint64. For AVX512F, the input argument has a data type ofsingle, double, int32, or int64.For integer data types, the value of Saturate on integer overflow is set to No. |
minus | For AVX, SSE, and FMA, the input argument has a data type ofsingle or double. For AVX2, SSE4.1, and SSE2, the input argument has a data type ofsingle, double, int8,int16, int32 orint64. For AVX512F, the input argument has a data type ofsingle, double, int32, or int64.For integer data types, the value of Saturate on integer overflow is set to No. |
times | For AVX, SSE, and FMA, the input argument has a data type ofsingle or double. For AVX2, SSE4.1, and SSE2, the input argument has a data type ofsingle, double,int16, or int32. For AVX512F, the input argument has a data type ofsingle or double.For integer data types, the value of Saturate on integer overflow is set to No. |
rdivide | The input argument has a data type of single ordouble. |
sqrt | The input argument has a data type of single ordouble. |
ceil | For AVX2, AVX, SSE4.1, SSE2, and SSE, the input argument has a data type ofsingle or double.AVX512F is not supported. |
floor | For AVX2, AVX, SSE4.1, SSE2, and SSE, the input argument has a data type ofsingle or double.AVX512F is not supported. |
max | For AVX, SSE, and FMA, the input argument has a data type ofsingle or double. For AVX2, SSE4.1, and SSE2, the input argument has a data type ofsingle, double, int8,int16, or int32. For AVX512F, the input argument has a data type ofsingle, double, orint32.For integer data types, the value of Saturate on integer overflow is set to No. |
min | For AVX, SSE, and FMA, the input argument has a data type ofsingle or double. For AVX2, SSE4.1, and SSE2, the input argument has a data type ofsingle, double, int8,int16, or int32. For AVX512F, the input argument has a data type ofsingle, double, orint32.For integer data types, the value of Saturate on integer overflow is set to No. |
sum | For AVX, SSE, and FMA, the input argument has a data type ofsingle or double. For AVX2, SSE4.1, and SSE2, the input argument has a data type ofsingle, double, int8,int16, int32 orint64. For AVX512F, the input argument has a data type ofsingle, double, int32, or int64.Optimize reductions is set toon.For integer data types, the value of Saturate on integer overflow is set to No. |
prod | For AVX, SSE, and FMA, the input argument has a data type ofsingle or double. For AVX2, SSE4.1, and SSE2, the input argument has a data type ofsingle, double,int16, or int32.For AVX512F, the input argument has a data type of single or double.Optimize reductions is set toon.For integer data types, the value of Saturate on integer overflow is set to No. |
bitand | For SSE2, the input argument has a data type of int8,int16, int32, orint64. For AVX2, and AVX512F, the input argument has a data type ofint8, int16, int32, or int64. |
bitor | For SSE2, the input argument has a data type of int8,int16, int32, orint64. For AVX2, and AVX512F, the input argument has a data type ofint8, int16, int32, or int64. |
bitxor | For SSE2, the input argument has a data type of int8,int16, int32, orint64. For AVX2, and AVX512F, the input argument has a data type ofint8, int16, int32, or int64. |
bitshift | The input argument has a data type of int32 orint64. |
< | For SSE4.1, the input argument has a data type of single,double, or int32. For AVX, the input argument has a data type of single ordouble. For AVX512F, the input argument has a data type ofsingle, double, int32, or int64. |
<= | For SSE4.1 and AVX, the input argument has a data type ofsingle or double. For AVX512F, the input argument has a data type ofsingle, double, int32, or int64. |
> | For SSE4.1, the input argument has a data type of single,double, or int32. For AVX, the input argument has a data type of single ordouble. For AVX2, the input argument has a data type of int32 orint64. For AVX512F, the input argument has a data type ofsingle, double, int32, or int64. |
>= | For SSE4.1 and AVX, the input argument has a data type ofsingle or double. For AVX512F, the input argument has a data type ofsingle, double, int32, or int64. |
== | For SSE4.1, AVX, AVX2, and AVX512F, the input argument has a data type ofsingle, double, int32, orint64. |
If you have a DSP System Toolbox™, you can generate SIMD code from certain MATLAB System objects. For more information, see System objects in DSP System Toolbox that Support SIMD Code Generation (DSP System Toolbox).
Generate SIMD Code Versus Plain C Code
Consider the MATLAB function dynamic
. This function consists of addition and multiplication operations between the variable-size arrays A
andB
. These arrays have a data type of single
and an upper bound of 100 x 100
.
function C = dynamic(A, B) assert(all(size(A) <= [100 100])); assert(all(size(B) <= [100 100])); assert(isa(A, 'single')); assert(isa(B, 'single'));
C = zeros(size(A), 'like', A); for i = 1:numel(A) C(i) = (A(i) .* B(i)) + (A(i) .* B(i)); end end
To generate plain C code at the command line:
- For
C
library code generation, create a code generation configuration object.
cfg = coder.config('lib'); - To generate plain C code, set the
InstructionSetExtensions
property toNone
.
cfg.InstructionSetExtensions = 'None'; - To generate a static library in the default location,
codegen\lib\dynamic
, use the codegen function.
codegen('-config', cfg, 'dynamic'); - In the list of generated files, click
dynamic.c
. In the plain (non-SIMD) C code, each loop iteration produces one result.
void dynamic(const float A_data[], const int A_size[2], const float B_data[],
const int B_size[2], float C_data[], int C_size[2])
{
float C_data_tmp;
int i;
int loop_ub;
(void)B_size;
C_size[0] = (signed char)A_size[0];
C_size[1] = (signed char)A_size[1];
loop_ub = (signed char)A_size[0] * (signed char)A_size[1];
if (0 <= loop_ub - 1) {
memset(&C_data[0], 0, loop_ub * sizeof(float));
}
loop_ub = A_size[0] * A_size[1];
for (i = 0; i < loop_ub; i++) {
C_data_tmp = A_data[i] * B_data[i];
C_data[i] = C_data_tmp + C_data_tmp;
}
}
To generate SIMD C
code at the command line:
- For
C
library code generation, use the coder.config function to create a code generation configuration object.
cfg = coder.config('lib'); - Set the coder.HardwareImplementation object
TargetHWDeviceType
property to'Intel->x86-64 (Linux 64)'
or'Intel->x86-64 (Windows64)'
.
cfg.HardwareImplementation.TargetHWDeviceType = 'Intel->x86-64 (Windows64)'; - Set the coder.HardwareImplementation object
ProdHWDeviceType
property to'Intel->x86-64 (Linux 64)'
or'Intel->x86-64 (Windows64)'
cfg.HardwareImplementation.ProdHWDeviceType = 'Intel->x86-64 (Windows64)';
If you are using the MATLAB Coder app to generate code:- Set the Hardware Board parameter to
None-Select device below
. - Set the Device vendor parameter to
Intel
,AMD
, orGeneric
. - Set the Device type to
x86-64 (Linux 64)
,x86-64 (Windows64)
, orMATLAB Host Computer
.
- Set the Hardware Board parameter to
- Set the
InstructionSetExtensions
property to an instruction set extension that your processor supports. This example uses SSE2 for Windows.
cfg.InstructionSetExtensions = 'SSE2';
The library that you choose depends on which instruction set extension your processor supports. If you use Embedded Coder, you can also select from the instruction setsSSE
,SSE4.1
,AVX
,AVX2
,FMA
, andAVX512F
.
For more information, see https://www.intel.com/content/www/us/en/support/articles/000005779/processors.html.
If you do not select an instruction set, the code generator uses the default settingAuto
, which resolves to an instruction set according to your hardware:- MATLAB Host Computer, Intel, or AMD® -
SSE2
- ARM
-None
If you are using the MATLAB Coder app to generate code, on theSpeed tab, set the Leverage target hardware instruction set extensions parameter to an instruction set that your processor supports.
- MATLAB Host Computer, Intel, or AMD® -
- Optionally, select the
OptimizeReductions
parameter to generate SIMD code for reduction operations such as sum and product functions.
cfg.OptimizeReductions = 'on';
If you are using the MATLAB Coder app to generate code, on theSpeed tab, select the Optimize reductions parameter. - Use the codegen function to generate a static library in the default location,
codegen\lib\dynamic
.
codegen('-config', cfg, 'dynamic'); - In the list of generated files, click
dynamic.c
.
void dynamic(const float A_data[], const int A_size[2], const float B_data[],
const int B_size[2], float C_data[], int C_size[2])
{
__m128 r;
float C_data_tmp;
int i;
int loop_ub;
int scalarLB;
int vectorUB;
(void)B_size;
C_size[0] = (signed char)A_size[0];
C_size[1] = (signed char)A_size[1];
loop_ub = (signed char)A_size[0] * (signed char)A_size[1];
if (0 <= loop_ub - 1) {
memset(&C_data[0], 0, loop_ub * sizeof(float));
}
loop_ub = A_size[0] * A_size[1];
scalarLB = (loop_ub / 4) << 2;
vectorUB = scalarLB - 4;
for (i = 0; i <= vectorUB; i += 4) {
r = _mm_mul_ps(_mm_loadu_ps(&A_data[i]), _mm_loadu_ps(&B_data[i]));
_mm_storeu_ps(&C_data[i], _mm_add_ps(r, r));
}
for (i = scalarLB; i < loop_ub; i++) {
C_data_tmp = A_data[i] * B_data[i];
C_data[i] = C_data_tmp + C_data_tmp;
}
}
The SIMD instructions are the intrinsic functions that start with the identifier_mm
. These functions process multiple data in a single iteration of the loop because the loop increments by four for single
data types. For double
data types, the loop increments by two. For MATLAB code that processes more data and is more computationally intensive, than the code in this example, the presence of SIMD instructions can significantly speed up the code execution time.
The second for
loop is in the generated code because thefor
loop that contains SIMD code must be divisible by four for single data types. The second loop processes the remainder of the data.
For a list of a Intel intrinsic functions for supported MATLAB functions, see https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html.
Generate SIMD Instructions for MEX Code
To generate SIMD instructions in MEX code, set the MEX configuration parameterHardware SIMD acceleration to one of these acceleration levels.
- Full — AVX2
- Portable — SSE2
- None — No instruction sets
MATLAB Coder checks your specified hardware and compiler and uses the compatible intrinsics up to the level that you specify according to this list.
To generate SIMD instructions for MEX code at the command line, use the coder.MexCodeConfig object and set the 'SIMDAcceleration'
property to 'Full'
, 'Portable'
, or'None'
.
Limitations
The generated code does not contain SIMD code when the MATLAB code meets these conditions:
- Scalar operations outside a loop. For example, if
a,b
, andc
are scalars, the generated code does not contain SIMD code for an operation such asc=a+b
. - Indirectly indexed arrays or matrices. For example, if
A,B,C
, andD
are vectors, the generated code does not contain SIMD code for an operation such asD(A)=C(A)+B(A)
. - Parallel for-Loops (
parfor
). Theparfor
loop does not contain SIMD code, but loops within the body of theparfor
loop might contain SIMD code. - Polyspace® does not support analysis of generated code that includes SIMD instructions. Disable SIMD code generation by setting the Leverage target hardware instruction set extensions parameter to
None
.
Related Topics
- Generate C Code by Using the MATLAB Coder App
- Generate C Code at the Command Line
- Generate SIMD Code from Simulink Blocks for Intel Platforms (Embedded Coder)
- Generate SIMD Code from MATLAB Functions for ARM Platforms (Embedded Coder)
- Use Intel AVX2 Code Replacement Library to Generate SIMD Code from MATLAB Algorithms (DSP System Toolbox)
- Use Intel AVX2 Code Replacement Library to Generate SIMD Code from Simulink Blocks (DSP System Toolbox)