Choose How to Manage Data in Parallel Computing - MATLAB & Simulink (original) (raw)
To perform parallel computations, you need to manage data access and transfer between your MATLAB® client and the parallel workers. Use this page to decide how to transfer data the client and workers. You can manage data such as files, MATLAB variables, and handle-type resources.
Determine Your Data Management Approach
The best techniques for managing data depend on your parallel application. Use the following tables to look for your goals and discover appropriate data management functions and their key features. In some cases, more than one type of object or function might meet your requirements. You can choose the type of object or function based on your workflow.
Transfer Data from Client to Workers
Use this table to identify some goals for transferring data from the client to workers and discover recommended workflows.
Goal | Recommended Workflow |
---|---|
Use variables in your MATLAB workspace in an interactive parallel pool. | The parfor and spmd functions automatically transfer variables in the client workspace to workers. To send variables to workers in aparfeval computation, you must specify variables as input arguments in the parfeval function call. |
Transfer variables in your MATLAB workspace to workers on a cluster in a batch workflow. | Pass variables as inputs into batch function. |
Give workers access to large data stored on your desktop. | To give workers in a parallel pool access to large data, save the data to the parallel.pool.Constant object. To give workers in a batch job created with the batch function access to large data, pass the data as an argument to the batch function. To give workers in a batch job created with the createJob function access to large data, you can pass the data to the job ValueStore objects before you submit the job tasks. |
Access large amounts of data or large files stored in the cloud and process it in an onsite or cloud cluster. | Use datastore with tall and distributed arrays to access and process data that does not fit in memory. |
Give workers access to files stored on the client computer. | For workers in a parallel pool: If the files are small or contain live data, you can specify files to send to workers using theaddAttachedFiles function. If the files are large or contain static data, you can reduce data transfer overheads by moving the files to the cluster storage. Use theaddpath function to add their location to the workers' search paths.For workers running batch jobs: If the files are small or are frequently modified, you can let MATLAB determine which files to send to workers by setting theAutoAttachFiles property of the job to true. You can check if AutoAttachFiles has picked up all the file dependencies by running thelistAutoAttachedFiles function.You can also specify files to send to workers using theAttachedFiles property of the job.If the files are large or are not frequently modified, you can reduce data transfer overheads by moving the files to the cluster storage and use the AdditionalPaths property of the job to specify their location. You must ensure that the workers have access to the cluster storage location. |
Access custom MATLAB functions or libraries that are stored on the cluster. | Specify paths to the libraries or functions using the AdditionalPaths property of a parallel job. |
Allow workers in a parallel pool to access non-copyable resources such as database connections or file handle | Use parallel.pool.Constant objects to manage handle-type resources such as database connections or file handles across pool workers. |
Send a message to a worker in an interactive pool running a function. | Create aparallel.pool.PollableDataQueue object that the worker can poll to receive data. For an example of this workflow, see Send Messages to Workers Using Pollable Data Queues. (since R2025a) Before R2025a: Create a parallel.pool.PollableDataQueue object at the worker, and send this object back to the client. Then you can use the PollableDataQueue object to send a message to the worker. For an example of this workflow, see Receive Communication on Workers. |
Transfer Data Between Workers
Use this table to identify some goals for transferring data between workers and discover recommended workflows.
Goal | Recommended Workflow |
---|---|
Coordinate data transfer between workers as part of a parallel pipeline application.Communicate between workers with Message Passing Interface (MPI). | In an interactive parallel pool, use parallel.pool.PollableDataQueue to transfer messages and data between workers. For an example of this workflow, see Transfer Data Between Workers Using Pollable Data Queues. _(since R2025a)_Use the spmdSend,spmdReceive, spmdSendReceive and spmdBarrier functions to communicate between workers in an spmd block. These functions use the Message Passing Interface (MPI) to send and receive data between workers. |
Offload results from workers, which another worker can process. | Store the data in the ValueStore object of the job or parallel pool. Multiple workers can read and write to the ValueStore object, which is stored on a shared file system accessible by the client and all workers. |
Transfer Data from Workers to Client
Use this table to identify some goals for transferring data from a worker to a client and discover recommended workflows.
Goal | Recommended Workflow |
---|---|
Retrieve results from aparfeval calculation. | Apply the fetchOutputs (parfeval) function to theparfeval Future object. |
Retrieve large results at the client. | Store the data in the ValueStore object of the job or parallel pool. Multiple workers can read and write to the ValueStore object, which is stored on a shared file system accessible by the client and all workers. |
Transfer a large file to the client.Transfer files created during a batch execution back to the client. | Use the FileStore object of the parallel pool or job to store the files. Workers can read and write to the FileStore object, which is stored on a shared file system accessible by the client and all workers. |
Fetch the results from a parallel job. | Apply the fetchOutputs (Job) function to the job object to retrieve all the output arguments from all tasks in a job. |
Load the workspace variables from a batch job running a script or expression. | Apply the load function to the job object to load all the workspace variables on the workers. |
Transfer Data from Workers to Client During Execution
Use this table to identify some goals for transferring data from a worker during execution and discover recommended workflows.
Goal | Recommended Workflow |
---|---|
Inspect results from parfor orparfeval calculations in interactive parallel pool. | Use a PollableDataQueue to send results to the client during execution. |
Update a plot, progress bar or other user interface with data from a function running in an interactive parallel pool. | Send the data to the client with a parallel.pool.DataQueue and useafterEach to run a function that updates the user interface when new data is received.For very large computations with 1000s of calls to the afterEach update function, you might want to turn off visualizations. Visualizing results can be very useful but you can observe some performance degradation when you scale up to large calculations. |
Collect data asynchronously to update a plot, progress bar or other user interface with data from aparfeval calculation. | Use afterEach to schedule a callback function that updates the user interface after aFuture object finishes. |
Track the progress of a job.Retrieve some intermediate results while a job is running. | Store the data in the ValueStore object of the job. Use theKeyUpdatedFcn or theKeyRemovedFcn properties of theValueStore object to run a callback function that updates a user interface at the client when data is added or removed from theValueStore. |
Send a large file to the client.Transfer files created during a batch execution back to the client. | Store the files in the FileStore object of the job to store the files. Use theKeyUpdatedFcn or theKeyRemovedFcn properties of theFileStore object to run a callback function that sends files to the client when files are added or removed from theFileStore. |
Compare Data Management Functions and Objects
Some parallel computing objects and functions that manage data have similar features. This section provides comparisons of the functions and objects that have similar features for managing data.
DataQueue
vs. ValueStore
DataQueue
and ValueStore
are two objects in Parallel Computing Toolbox™ you can use transfer data between client and workers. TheDataQueue
object passes data from workers to the client in a first-in, first-out (FIFO) order, while ValueStore
stores data that multiple workers as well as the client can access and update. You can use both objects for asynchronous data transfer to the client. However,DataQueue
is only supported on interactive parallel pools.
The choice between DataQueue
and ValueStore
depends on the data access pattern you require in your parallel application. If you have many independent tasks that workers can execute in any order, and you want to pass data to the client in a streaming fashion, then use aDataQueue
object. However, if you want to store and share values to multiple workers and access or update it at any time, then useValueStore
instead.
fetchOutputs (parfeval)
vs. ValueStore
Use the fetchOutputs
function to retrieve the output arguments of a Future
object, which the software returns when you run a parfeval
or parfevalOnAll
computation. fetchOutputs
blocks the client until the computation is complete, then sends the results of theparfeval
or parfevalOnAll
computation to the client. In contrast, you can use ValueStore
to store and retrieve values from any parallel computation and also retrieve intermediate results as they are produced without blocking the program. Additionally, the ValueStore
object is not held in system memory, so you can store large results in the ValueStore
. However, be careful when storing large amounts of data to avoid filling up the disk space on the cluster.
If you only need to retrieve the output of a parfeval
orparfevalOnAll
computation, thenfetchOutputs
is the simpler option. However, if you want to store and access the results of multiple independent parallel computations, then use ValueStore
. In cases where you have multiple parfeval
computations generating large amounts of data, using the pool ValueStore
object can help avoid memory issues on the client. You can temporarily save the results in theValueStore
and retrieve them when you need them.
load
and fetchOutputs (Jobs)
vs. ValueStore
load
, fetchOutputs (Jobs)
, andValueStore
provide different ways of transferring data from jobs back to the client.
load
retrieves the variables related to a job you create when you use the batch
function to run a script or an expression. This includes any input arguments you provide and temporary variables the workers create during the computation. load
does not retrieve the variables from batch
jobs that run a function and you cannot retrieve results while the job is running.fetchOutputs (Jobs)
retrieves the output arguments contained in the tasks of a finished job you create using thebatch
, createJob
orcreateCommunicatingJob
functions. If the job is still running when you call the fetchOutputs (Jobs)
function, thefetchOutputs (Jobs)
function returns an error.
When you create a job on a cluster, the software automatically creates aValueStore
object for the job, and you can use it to store data generated during job execution. Unlike the load
andfetchOutputs
functions, the ValueStore
object does not automatically store data. Instead, you must manually add data as key-value pairs to the ValueStore
object. Workers can store data in the ValueStore
object that the MATLAB client can retrieve during the job execution. Additionally, theValueStore
object is not held in system memory, so you can store large results in the store.
To retrieve the results of a job after the job has finished, use theload
or fetchOutputs (Jobs)
function. To access the results or track the progress of a job while it is still running, or to store potentially high memory results, use theValueStore
object
AdditionalPaths
vs. AttachedFiles
vs. AutoAttachedFiles
AdditionalPaths
, AttachedFiles
, andAutoAttachedFiles
are all parallel job properties that you can use to specify additional files and directories that are required to run parallel code on workers.
AdditionalPaths
is a property you can use to add cluster file locations to the MATLAB path on all workers running your job. This can be useful if you have files with large data stored on the cluster storage, functions or libraries that are required by the workers, but are not on the MATLAB path by default.
The AttachedFiles
property allows you to specify files or directories that are required by the workers but are not stored on the cluster storage. These files are copied to a temporary directory on each worker before the parallel code runs. The files can be scripts, functions, or data files, and must be located within the directory structure of the client.
Use the AutoAttachedFiles
property to allow files needed by the workers to be automatically attached to the job. When you submit a job or task, MATLAB performs dependency analysis on all the task functions, or on the batch job script or function. Then it automatically adds the files required to the job or task object so they are transferred to the workers. Essentially, you only want to set the AutoAttachedFiles
property tofalse
if you know that you do not need the software to identify the files for you. For example, if the files your job is going to use are already present on the cluster, perhaps inside one of theAdditionalPaths
locations.
Use AdditionalPaths
when you have functions and libraries stored on the cluster that are required on all workers. UseAttachedFiles
when you have small files that are required to run your code. To let MATLAB automatically determine if a job requires additional files to run, set the AutoAttachedFiles
property totrue
.
See Also
ValueStore | FileStore | parallel.pool.Constant | parallel.pool.PollableDataQueue | spmdSend | spmdReceive | spmdSendReceive | spmdBarrier | fetchOutputs | fetchOutputs | load | parallel.pool.DataQueue