Run Single Programs on Multiple Data Sets - MATLAB & Simulink (original) (raw)
Introduction
The single program multiple data (SPMD) language construct allows seamless interleaving of serial and parallel programming. The spmd
statement lets you define a block of code to run simultaneously on multiple workers. Variables assigned inside the spmd
statement on the workers allow direct access to their values from the client by reference via_Composite_ objects.
This chapter explains some of the characteristics of spmd statements and Composite objects.
When to Use spmd
The “single program” aspect of SPMD means that the identical code runs on multiple workers. You run one program in the MATLAB® client, and those parts of it labeled as spmd
blocks run on the workers. When the spmd
block is complete, your program continues running in the client.
The “multiple data” aspect means that even though thespmd
statement runs identical code on all workers, each worker can have different, unique data for that code. So multiple data sets can be accommodated by multiple workers.
Typical applications appropriate for spmd
are those that require running simultaneous execution of a program on multiple data sets, when communication or synchronization is required between the workers. Some common cases are:
- Programs that take a long time to execute —
spmd
lets several workers compute solutions simultaneously. - Programs operating on large data sets —
spmd
lets the data be distributed to multiple workers.
For more information, see Choose Between spmd, parfor, and parfeval.
Define an spmd
Statement
The general form of an spmd
statement is:
Note
If a parallel pool is not running, with default parallel settings,spmd
creates a pool using your default cluster profile.
The block of code represented by <statements>
executes in parallel simultaneously on all workers in the parallel pool. If you want to limit the execution to only a portion of these workers, specify exactly how many workers to run on:
spmd (n) end
This statement requires that n
workers run thespmd
code. n
must be less than or equal to the number of workers in the open parallel pool. If the pool is large enough, butn
workers are not available, the statement waits until enough workers are available. If n
is 0, the spmd
statement uses no workers, and runs locally on the client, the same as if there were not a pool currently running.
You can specify a range for the number of workers:
spmd (m,n) end
In this case, the spmd
statement requires a minimum ofm
workers, and it uses a maximum of n
workers.
If it is important to control the number of workers that execute yourspmd
statement, set the exact number in the cluster profile or with the spmd
statement, rather than using a range.
For example, create a random matrix on three workers:
spmd (3) R = rand(4,4); end
Note
All subsequent examples in this chapter assume that a parallel pool is open and remains open between sequences of spmd
statements.
Unlike a parfor
-loop, the workers used for anspmd
statement each have a unique value for spmdIndex. This lets you specify code to be run on only certain workers, or to customize execution, usually for the purpose of accessing unique data.
For example, create different sized arrays depending onspmdIndex
:
spmd (3) if spmdIndex==1 R = rand(9,9); else R = rand(4,4); end end
Load unique data on each worker according to spmdIndex
, and use the same function on each worker to compute a result from the data:
spmd (3) labdata = load(['datafile_' num2str(spmdIndex) '.ascii']) result = MyFunction(labdata) end
The workers executing an spmd
statement operate simultaneously and are aware of each other. As with a communicating job, you are allowed to directly control communications between the workers, transfer data between them, and use codistributed arrays among them.
For example, use a codistributed array in an spmd
statement:
spmd (3) RR = rand(30, codistributor()); end
Each worker has a 30-by-10 segment of the codistributed arrayRR
. For more information about codistributed arrays, seeWorking with Codistributed Arrays.
Display Output
When running an spmd
statement on a parallel pool, all command-line output from the workers displays in the client Command Window. Because the workers are MATLAB sessions without displays, any graphical output (for example, figure windows) from the pool does not display at all.
MATLAB Path
All workers executing an spmd
statement must have the same MATLAB search path as the client, so that they can execute any functions called in their common block of code. Therefore, whenever you use cd, addpath, or rmpath on the client, it also executes on all the workers, if possible. For more information, see the parpool reference page. When the workers are running on a different platform than the client, use the functionpctRunOnAll to properly set the MATLAB path on all workers.
Error Handling
When an error occurs on a worker during the execution of anspmd
statement, the error is reported to the client. The client tries to interrupt execution on all workers, and throws an error to the user.
Errors and warnings produced on workers are annotated with the worker ID (spmdIndex
) and displayed in the client's Command Window in the order in which they are received by the MATLAB client.
The behavior of lastwarn is unspecified at the end of an spmd
if used within its body.
spmd Limitations
Nested Functions
Inside a function, the body of an spmd
statement cannot reference a nested function. However, it can call a nested function by means of a variable defined as a function handle to the nested function.
Because the spmd
body executes on workers, variables that are updated by nested functions called inside an spmd
statement are not updated in the workspace of the outer function.
Nested spmd
Statements
The body of an spmd
statement cannot directly contain another spmd
. However, it can call a function that contains another spmd
statement. The inner spmd
statement does not run in parallel in another parallel pool, but runs serially in a single thread on the worker running its containing function.
Nested parfor
-Loops
An spmd
statement cannot contain aparfor
-loop, and the body of aparfor
-loop cannot contain an spmd
statement.
break
, continue
, and return
Statements
The body of an spmd
statement cannot containbreak
, continue
, orreturn
statements. Consider parfeval or parfevalOnAll instead ofspmd
, because you can use cancel on them.
Global and Persistent Variables
The body of an spmd
statement cannot containglobal
or persistent
variable declarations. The reason is that these variables are not synchronized between workers. You can use global
orpersistent
variables within functions, but their value is only visible to the worker that creates them. Instead ofglobal
variables, it is a better practice to use function arguments to share values.
Anonymous Functions
The body of an spmd
statement cannot define an anonymous function. However, it can reference an anonymous function by means of a function handle.
inputname
Functions
Using inputname
to return the workspace variable name corresponding to an argument number is not supported insidespmd
. The reason is that spmd
workers do not have access to the workspace of the MATLAB desktop. To work around this, call inputname
before spmd
, as shown in the following example.
a = 'a'; myFunction(a)
function X = myFunction(a) name = inputname(1); spmd X.(name) = spmdIndex; end X = [X{:}]; end
load
Functions
The syntaxes of load
that do not assign to an output structure are not supported inside spmd
statements. Insidespmd
, always assign the output ofload
to a structure.
nargin
or nargout
Functions
The following uses are not supported inside spmd
statements:
- Using
nargin
ornargout
without a function argument - Using
narginchk
ornargoutchk
to validate the number of input or output arguments in a call to the function that is currently executing
The reason is that workers do not have access to the workspace of the MATLAB desktop. To work around this, call these functions beforespmd
.
myFunction('a','b')
function myFunction(a,b) nin = nargin; spmd X = spmdIndex*nin; end end
P-Code Scripts
You can call P-code script files from within an spmd
statement, but P-code scripts cannot contain an spmd
statement. To work around this, use a P-code function instead of a P-code script.
ans
Variable
References to the ans
variable defined outside anspmd
statement are not supported inside thespmd
statement. Inside the body of anspmd
statement, you must assign theans
variable before you use it.
Composites and Distributed Variables in Data Containers
Composites and distributed arrays must appear as their own top-level variables within an spmd
statement and must not be hidden inside other data containers such as structures, cell arrays, or objects.
In this example, you store the Composite object C
in a cell array Y
, which you then use as an input variable in anspmd
statement. As a result, MATLAB issues a warning at run time:
spmd; C = 5; end Y = {C}; spmd disp(Y) end
Similarly, you cannot use Composite or distributed arrays stored in an object such as a dictionary
in an spmd
statement. In this example, dd
is an invalid distributed array andCC
is an invalid Composite:
spmd; C = 5; d = ones(7,"codistributed"); end X = dictionary(["dist","comp"],{d,C}); spmd dd = X{"dist"} CC = X{"comp"} end
As a workaround, extract the Composite or distributed array from the data structure and assign it to a separate variable before thespmd
statement:
spmd; C = 5; d = ones(7,"codistributed"); end X = dictionary(["dist","comp"],{d,C}); dd = X{"dist"}; CC = X{"comp"}; spmd disp(dd) disp(CC) end
See Also
spmd | parfor | parfeval | parfevalOnAll | distributed | Composite