KeyValueDatastore - Datastore for key-value pair data for use with
mapreduce - MATLAB ([original](https://www.mathworks.com/help/matlab/ref/matlab.io.datastore.keyvaluedatastore.html)) ([raw](?raw))
Datastore for key-value pair data for use withmapreduce
Description
KeyValueDatastore
objects are associated with files containing key-value pair data that are outputs of or inputs tomapreduce
. Use the KeyValueDatastore
properties to specify how you want to access the data. Use dot notation to view or modify a particular property of a KeyValueDatastore
object:
ds = datastore("mapredout.mat"); ds.ReadSize = 20;
You also can specify the value of KeyValueDatastore
properties using name-value argument arguments when you create a datastore using thedatastore
function:
ds = datastore("mapredout.mat","ReadSize",20);
Creation
Create KeyValueDatastore
objects using the datastore function.
Properties
Files
— Files included in datastore
cell array of character vectors | string array
Files included in the datastore, specified as an n
-by-1 cell array of character vectors or string array, where each character vector or string is a full path to a file. These are the files defined by thelocation
argument to the datastore
function. The location
argument contains full paths to files on a local file system, a network file system, or a supported remote location such as Amazon S3™, Windows Azure® Blob Storage, and HDFS™. For more information, see Work with Remote Data.
The files must be either MAT-files or Sequence files generated by themapreduce
function.
Example: ["C:\dir\data\file1.mat";"C:\dir\data\file2.mat"]
Example: ["s3://bucketname/path_to_files/your_file01.mat";"s3://bucketname/path_to_files/your_file02.mat"]
Data Types: cell
| string
FileType
— File type
"mat"
(default) | "seq"
File type, specified as either "mat"
for MAT-files or"seq"
for sequence files. By default, the output ofmapreduce
running against Hadoop® is a datastore containing sequence files. By default, the output of all other mapreduce
operations is a datastore containing MAT-files.
Data Types: cell
| string
ReadSize
— Maximum number of key-value pairs to read
1 (default) | positive integer
Maximum number of key-value pairs to read in a call to theread
or preview
functions, specified as a positive integer.
AlternateFileSystemRoots
— Alternate file system root paths
string vector | cell array
Alternate file system root paths, specified as the name-value argument consisting of"AlternateFileSystemRoots"
and a string vector or a cell array. Use"AlternateFileSystemRoots"
when you create a datastore on a local machine, but need to access and process the data on another machine (possibly of a different operating system). Also, when processing data using the Parallel Computing Toolbox™ and the MATLAB® Parallel Server™, and the data is stored on your local machines with a copy of the data available on different platform cloud or cluster machines, you must use"AlternateFileSystemRoots"
to associate the root paths.
- To associate a set of root paths that are equivalent to one another, specify
"AlternateFileSystemRoots"
as a string vector. For example,
["Z:\datasets","/mynetwork/datasets"] - To associate multiple sets of root paths that are equivalent for the datastore, specify
"AlternateFileSystemRoots"
as a cell array containing multiple rows where each row represents a set of equivalent root paths. Specify each row in the cell array as either a string vector or a cell array of character vectors. For example:- Specify
"AlternateFileSystemRoots"
as a cell array of string vectors.
{["Z:\datasets", "/mynetwork/datasets"];...
["Y:\datasets", "/mynetwork2/datasets","S:\datasets"]} - Alternatively, specify
"AlternateFileSystemRoots"
as a cell array of cell array of character vectors.
{{'Z:\datasets','/mynetwork/datasets'};...
{'Y:\datasets', '/mynetwork2/datasets','S:\datasets'}}
- Specify
The value of "AlternateFileSystemRoots"
must satisfy these conditions:
- Contains one or more rows, where each row specifies a set of equivalent root paths.
- Each row specifies multiple root paths and each root path must contain at least two characters.
- Root paths are unique and are not subfolders of one another.
- Contains at least one root path entry that points to the location of the files.
For more information, see Set Up Datastore for Processing on Different Machines or Clusters.
Example: ["Z:\datasets","/mynetwork/datasets"]
Data Types: string
| cell
Object Functions
Examples
Set Number of Key-Value Pairs to Read
Create a datastore from the sample file,mapredout.mat
, which is an output file of themapreduce
function.
fs = matlab.io.datastore.FileSet("mapredout.mat"); ds = datastore(fs,"type","keyvalue")
ds = KeyValueDatastore with properties: Files: { '...\matlab\toolbox\matlab\demos\mapredout.mat' } ReadSize: 1 key-value pairs FileType: 'mat' AlternateFileSystemRoots: {}
Set the ReadSize
property to 8
so that each call to read reads at most 8
key-value pairs.
ds = KeyValueDatastore with properties: Files: { '...\matlab\toolbox\matlab\demos\mapredout.mat' } ReadSize: 8 key-value pairs FileType: 'mat' AlternateFileSystemRoots: {}
Read 8 key-value pairs at a time using the read
function in a while
loop. The loop executes until there is no more data available to read and hasdata(ds)
returnsfalse
.
while hasdata(ds) T = read(ds); end
Show the last set of key-value pairs read.
T=5×2 table
Key Value
______ ________
{'OO'} {[3090]}
{'TZ'} {[ 216]}
{'XE'} {[2357]}
{'9E'} {[ 521]}
{'YV'} {[ 849]}
Limitations
KeyValueDatastore
does not support sequence files written in R2013b. Rewrite the sequence files using a version of MATLAB between R2014a and R2018a.
Version History
Introduced in R2014b