FileStore - File storage shared by MATLAB clients and workers - MATLAB (original) (raw)
File storage shared by MATLAB clients and workers
Since R2022a
Description
FileStore
is an object that stores files owned by a specific job. Each entry of the object consists of a file and its corresponding key. When the owning job is deleted, the FileStore
object is deleted as well. UseFileStore
to store files from MATLAB® workers that can be retrieved by MATLAB clients during the execution of a job (even while the job is still running).
- Any MATLAB process client or worker can write an entry to the
FileStore
at any time. Any MATLAB process client or worker can then read this entry from theFileStore
at any time. However, the ordering of operations executed by different processes is not guaranteed. FileStore
can be used to return files when a cluster has no shared file system, or to run code that is not concerned about the location of any shared file system.FileStore
is not held in system memory, so it can be used to store large results.
Creation
The FileStore
object is automatically created when you create:
- A job on a cluster, which is a parallel.Job object. To create a job, use the batch, createJob, or createCommunicatingJob function.
- A parallel pool of process workers on the local machine, which is a ProcessPool object. To create a process pool, use the parpool function.
- A parallel pool of thread workers on the local machine, which is a ThreadPool object. To create a thread pool, use the parpool function. (since R2023b)
- A parallel pool of workers on a cluster of machines, which is a ClusterPool object. To create a cluster pool, use the parpool function.
You can access the FileStore
object on a worker by using the getCurrentFileStore function. You can then retrieve theFileStore
object on a client by using the FileStore
property that is associated with the job or the parallel pool. For example, see Run Batch Job and Retrieve Files from Workers.
Properties
KeyUpdatedFcn
— Callback executed when entry is added or replaced
function handle
Callback executed when an entry is added or replaced, specified as a function handle. The function handle must accept two input arguments that represent the FileStore
object and its key when an entry is added or replaced.
KeyRemovedFcn
— Callback executed when entry is removed
function handle
Callback executed when an entry is removed, specified as a function handle. The function handle must accept two input arguments that represent the FileStore
object and its key when an entry is removed.
Object Functions
isKey | Determine if ValueStore or FileStore object contains keys |
---|---|
keys | Return all keys of ValueStore or FileStore object |
copyFileToStore | Copy files from local file system to FileStore object |
copyFileFromStore | Copy files from FileStore object to local file system |
remove | Remove entries from ValueStore or FileStore object |
Examples
Run Batch Job and Retrieve Files from Workers
Run a simulation on workers and retrieve the file storage of the job on a client. The file storage is a FileStore
object with key-file entries.
The following simulation finds the average and standard deviation of random matrices and stores the results in the FileStore
object.
function workerStatsCode(models) % Get the FileStore of the current job store = getCurrentFileStore; for i = 1:numel(models) % Compute the average and standard deviation of random matrices A = rand(models(i)); M = mean(A); S = std(A); % Save simulation results in temporary files sourceTempFile = strcat(tempname("C:\myTempFolder"),".mat"); save(sourceTempFile,"M","S"); % Copy files to FileStore object as key-file pairs key = strcat("result_",num2str(i)); copyFileToStore(store,sourceTempFile,key); end end
The following callback function is executed when a file is copied to the FileStore
object.
function fileNewEntry(store,key) destination = strcat(key,".mat"); fprintf("Result %s added. Copying to local file system: %s\n",key,destination); copyFileFromStore(store,key,destination); end
Run a batch job on workers using the default cluster profile.
models = [4,8,32,20]; c = parcluster; job = batch(c,@workerStatsCode,0,{models});
Retrieve the FileStore
object on the client while the job is still running. Show the progress of the job.
store = job.FileStore; store.KeyUpdatedFcn = @fileNewEntry; wait(job);
Result result_1 added. Copying to local file system: result_1.mat Result result_2 added. Copying to local file system: result_2.mat Result result_3 added. Copying to local file system: result_3.mat Result result_4 added. Copying to local file system: result_4.mat
Display all the information on the variables stored in the file "result_3.mat"
.
whos -file 'result_3.mat'
Name Size Bytes Class Attributes
M 1x32 256 double
S 1x32 256 double
Run Simulation on Parallel Pool of Process Workers and Retrieve Files
Run a simulation on a parallel pool of process workers and retrieve the file storage on a client.
The following simulation finds the average and standard deviation of random matrices and stores the results in the FileStore
object.
function workerStatsCode(models) % Get the FileStore of the current job store = getCurrentFileStore; for i = 1:numel(models) % Compute the average and standard deviation of random matrices A = rand(models(i)); M = mean(A); S = std(A); % Save simulation results in temporary files sourceTempFile = strcat(tempname("C:\myTempFolder"),".mat"); save(sourceTempFile,"M","S"); % Copy files to FileStore object as key-file pairs key = strcat("result_",num2str(i)); copyFileToStore(store,sourceTempFile,key); end end
The following callback function is executed when a file is copied to the FileStore
object.
function fileNewEntry(store,key) destination = strcat(key,".mat"); fprintf("Result %s added. Copying to local file system: %s\n",key,destination); copyFileFromStore(store,key,destination); end
Start a parallel pool of process workers.
pool = parpool('Processes');
Starting parallel pool (parpool) using the 'Processes' profile ... Connected to parallel pool with 6 workers.
Get the FileStore
for this pool and assign the callback function to be executed when an entry is added.
store = pool.FileStore; store.KeyUpdatedFcn = @fileNewEntry;
Run the simulation on the pool.
models = [4,8,32,20]; future = parfeval(@workerStatsCode,0,models); wait(future);
Result result_1 added. Copying to local file system: result_1.mat Result result_2 added. Copying to local file system: result_2.mat Result result_3 added. Copying to local file system: result_3.mat Result result_4 added. Copying to local file system: result_4.mat
Display the variables stored in the local file result_3.mat
.
whos -file 'result_3.mat'
Name Size Bytes Class Attributes
M 1x32 256 double
S 1x32 256 double
Run Independent Tasks and Retrieve Data and Files from All Tasks
Run a job of independent tasks. Then, retrieve the data and file storage of the job on a client.
The following simulation finds the permutations and combinations of a vector, and stores the results in the ValueStore
and FileStore
objects.
function taskFunction(dataset,keyname) % Get the ValueStore and FileStore of the current job valueStore = getCurrentValueStore; fileStore = getCurrentFileStore; % Run the simulation to find permutation and combination [result,logFile] = runSimulation(dataset); % Store results in ValueStore to release system memory valueStore(keyname) = result; % Copy file to FileStore to retrieve the file from non-shared file system copyFileToStore(fileStore,logFile,keyname); end
function [result,logFile] = runSimulation(dataset) permutations = perms(dataset{1}); combinations = nchoosek(dataset{1},dataset{2}); result.N_perm = length(permutations); result.N_comb = length(combinations); logFile = strcat(tempname("C:\myLogFolder"),".mat"); save(logFile,"permutations","combinations") end
Create a job using the default cluster profile.
c = parcluster; job = createJob(c);
Create independent tasks for the job. Each task runs the simulation with the given input.
set_1 = {[12,34,54],2}; set_2 = {[45,33],1}; set_3 = {[12,12,12,13,14],3}; tasks = createTask(job,@taskFunction,0,{{set_1,"sim_1"},{set_2,"sim_2"},{set_3,"sim_3"}});
Run the job and wait for it to finish.
Retrieve the data and file storage of the job.
valueStore = job.ValueStore; fileStore = job.FileStore;
Show the result of the third task that is stored in the ValueStore
object.
result_3 = valueStore("sim_3")
result_3 = struct with fields: N_perm: 120 N_comb: 10
Copy files from the file storage as specified by the corresponding keys "sim_1"
and "sim_2"
to the local files "analysis_1.mat"
and "analysis_2.mat"
.
copyFileFromStore(fileStore,["sim_1" "sim_2"],["analysis_1.mat" "analysis_2.mat"]);
Display all the information on the variables stored in the local files.
whos -file 'analysis_1.mat'
Name Size Bytes Class Attributes
combinations 3x2 48 double
permutations 6x3 144 double
whos -file 'analysis_2.mat'
Name Size Bytes Class Attributes
combinations 2x1 16 double
permutations 2x2 32 double
Limitations
- When using
parallel.cluster.Generic
clusters with'HasSharedFileSystem'
set tofalse
, the visibility of modifications made toFileStore
while a job is running depends on your specific implementation. Without additional synchronization between the MATLAB client and workerJobStorageLocation
, changes might only be visible once the job has completed.
Version History
Introduced in R2022a
R2023b: Use FileStore
on thread-based parallel pools
You can now use FileStore
on ThreadPool objects.
FileStore
is not supported on MATLABbackgroundPool
.