KeyValueDatastore - Datastore for key-value pair data for use with

        mapreduce - MATLAB ([original](https://www.mathworks.com/help/matlab/ref/matlab.io.datastore.keyvaluedatastore.html)) ([raw](?raw))

Datastore for key-value pair data for use withmapreduce

Description

KeyValueDatastore objects are associated with files containing key-value pair data that are outputs of or inputs tomapreduce. Use the KeyValueDatastore properties to specify how you want to access the data. Use dot notation to view or modify a particular property of a KeyValueDatastore object:

ds = datastore("mapredout.mat"); ds.ReadSize = 20;

You also can specify the value of KeyValueDatastore properties using name-value argument arguments when you create a datastore using thedatastore function:

ds = datastore("mapredout.mat","ReadSize",20);

Creation

Create KeyValueDatastore objects using the datastore function.

Properties

expand all

Files — Files included in datastore

cell array of character vectors | string array

Files included in the datastore, specified as an n-by-1 cell array of character vectors or string array, where each character vector or string is a full path to a file. These are the files defined by thelocation argument to the datastore function. The location argument contains full paths to files on a local file system, a network file system, or a supported remote location such as Amazon S3™, Windows Azure® Blob Storage, and HDFS™. For more information, see Work with Remote Data.

The files must be either MAT-files or Sequence files generated by themapreduce function.

Example: ["C:\dir\data\file1.mat";"C:\dir\data\file2.mat"]

Example: ["s3://bucketname/path_to_files/your_file01.mat";"s3://bucketname/path_to_files/your_file02.mat"]

Data Types: cell | string

FileType — File type

"mat" (default) | "seq"

File type, specified as either "mat" for MAT-files or"seq" for sequence files. By default, the output ofmapreduce running against Hadoop® is a datastore containing sequence files. By default, the output of all other mapreduce operations is a datastore containing MAT-files.

Data Types: cell | string

ReadSize — Maximum number of key-value pairs to read

1 (default) | positive integer

Maximum number of key-value pairs to read in a call to theread or preview functions, specified as a positive integer.

AlternateFileSystemRoots — Alternate file system root paths

string vector | cell array

Alternate file system root paths, specified as the name-value argument consisting of"AlternateFileSystemRoots" and a string vector or a cell array. Use"AlternateFileSystemRoots" when you create a datastore on a local machine, but need to access and process the data on another machine (possibly of a different operating system). Also, when processing data using the Parallel Computing Toolbox™ and the MATLAB® Parallel Server™, and the data is stored on your local machines with a copy of the data available on different platform cloud or cluster machines, you must use"AlternateFileSystemRoots" to associate the root paths.

The value of "AlternateFileSystemRoots" must satisfy these conditions:

For more information, see Set Up Datastore for Processing on Different Machines or Clusters.

Example: ["Z:\datasets","/mynetwork/datasets"]

Data Types: string | cell

Object Functions

Examples

collapse all

Set Number of Key-Value Pairs to Read

Create a datastore from the sample file,mapredout.mat, which is an output file of themapreduce function.

fs = matlab.io.datastore.FileSet("mapredout.mat"); ds = datastore(fs,"type","keyvalue")

ds = KeyValueDatastore with properties: Files: { '...\matlab\toolbox\matlab\demos\mapredout.mat' } ReadSize: 1 key-value pairs FileType: 'mat' AlternateFileSystemRoots: {}

Set the ReadSize property to 8 so that each call to read reads at most 8 key-value pairs.

ds = KeyValueDatastore with properties: Files: { '...\matlab\toolbox\matlab\demos\mapredout.mat' } ReadSize: 8 key-value pairs FileType: 'mat' AlternateFileSystemRoots: {}

Read 8 key-value pairs at a time using the read function in a while loop. The loop executes until there is no more data available to read and hasdata(ds) returnsfalse.

while hasdata(ds) T = read(ds); end

Show the last set of key-value pairs read.

T=5×2 table Key Value
______ ________

{'OO'}    {[3090]}
{'TZ'}    {[ 216]}
{'XE'}    {[2357]}
{'9E'}    {[ 521]}
{'YV'}    {[ 849]}

Limitations

Version History

Introduced in R2014b