TallDatastore - Datastore for checkpointing tall arrays - MATLAB (original) (raw)
Datastore for checkpointing tall
arrays
Description
TallDatastore
objects are for recreatingtall
arrays from binary files written to disk by the write function. You can use the object to recreate the originaltall
array, or you can access and manage the data by specifyingTallDataStore
properties and using the object functions.
Creation
Create TallDatastore
objects using the datastore function. For example, tds = datastore(location,"Type","tall")
creates a datastore from a collection of files specified by location
.
Properties
Files
— Files included in datastore
character vector | cell array of character vectors | string scalar | string array
Files included in the datastore, resolved as a character vector, cell array of character vectors, string scalar, or string array, where each character vector or string is a full path to a file.
The location
argument of thedatastore
function defines theFiles
property when the datastore is created. Thelocation
argument contains full paths to files on a local file system, a network file system, or a supported remote location such as Amazon S3™, Windows Azure® Blob Storage, and HDFS™. For more information, see Work with Remote Data.
The files must be either MAT-files or Sequence files generated by thewrite
function.
Example: ["C:\dir\data\file1.ext";"C:\dir\data\file2.ext"]
Example: ["s3://bucketname/path_to_files/your_file01.ext";"s3://bucketname/path_to_files/your_file02.ext"]
Data Types: char
| cell
| string
FileType
— File type
"mat"
| "seq"
File type, specified as either "mat"
for MAT-files or"seq"
for sequence files. By default, the type of file in the provided location determines theFileType
.
Data Types: char
| string
ReadSize
— Maximum number of data rows to read
positive integer
Maximum number of data rows to read in a call to theread
or preview
functions, specified as a positive integer. When the datastore
function creates a TallDatastore
, it determines and assigns the best possible value for ReadSize
.
AlternateFileSystemRoots
— Alternate file system root paths
string vector | cell array
Alternate file system root paths, specified as the name-value argument consisting of"AlternateFileSystemRoots"
and a string vector or a cell array. Use"AlternateFileSystemRoots"
when you create a datastore on a local machine, but need to access and process the data on another machine (possibly of a different operating system). Also, when processing data using the Parallel Computing Toolbox™ and the MATLAB® Parallel Server™, and the data is stored on your local machines with a copy of the data available on different platform cloud or cluster machines, you must use"AlternateFileSystemRoots"
to associate the root paths.
- To associate a set of root paths that are equivalent to one another, specify
"AlternateFileSystemRoots"
as a string vector. For example,
["Z:\datasets","/mynetwork/datasets"] - To associate multiple sets of root paths that are equivalent for the datastore, specify
"AlternateFileSystemRoots"
as a cell array containing multiple rows where each row represents a set of equivalent root paths. Specify each row in the cell array as either a string vector or a cell array of character vectors. For example:- Specify
"AlternateFileSystemRoots"
as a cell array of string vectors.
{["Z:\datasets", "/mynetwork/datasets"];...
["Y:\datasets", "/mynetwork2/datasets","S:\datasets"]} - Alternatively, specify
"AlternateFileSystemRoots"
as a cell array of cell array of character vectors.
{{'Z:\datasets','/mynetwork/datasets'};...
{'Y:\datasets', '/mynetwork2/datasets','S:\datasets'}}
- Specify
The value of "AlternateFileSystemRoots"
must satisfy these conditions:
- Contains one or more rows, where each row specifies a set of equivalent root paths.
- Each row specifies multiple root paths and each root path must contain at least two characters.
- Root paths are unique and are not subfolders of one another.
- Contains at least one root path entry that points to the location of the files.
For more information, see Set Up Datastore for Processing on Different Machines or Clusters.
Example: ["Z:\datasets","/mynetwork/datasets"]
Data Types: string
| cell
Object Functions
Examples
Recreate tall
Arrays from Files Saved Using write
Function
Use TallDatastore
objects to reconstruct tall arrays directly from files on disk rather than re-executing all of the commands that produced the tall array. Create a tall array and save it to disk using write
function. Retrieve the tall
array using datastore
and then convert it back totall
.
Create a simple tall double.
t =
500×1 tall double column vector
0.8147
0.9058
0.1270
0.9134
0.6324
0.0975
0.2785
0.5469
:
:
Save the results to a new folder namedExample_Folder
.
location = fullfile(matlabroot,"toolbox","matlab","demos","Folder1"); write(location, t);
Writing tall data to folder H:\matlab\toolbox\matlab\demos\Folder1 Evaluating tall expression using the Local MATLAB Session:
- Pass 1 of 1: Completed in 0.063 sec Evaluation completed in 0.16 sec
To recover the tall
array that was written to disk, first create a new datastore that references the same directory. Then convert the datastore into a tall
array.
tds = datastore(location); t1 = tall(tds)
t1 =
M×1 tall double column vector
0.8147
0.9058
0.1270
0.9134
0.6324
0.0975
0.2785
0.5469
:
:
Version History
Introduced in R2016b