KeyValueStore - Store key-value pairs for use with mapreduce - MATLAB (original) (raw)

Main Content

Store key-value pairs for use with mapreduce

Description

The mapreduce function automatically creates aKeyValueStore object during execution and uses it to store key-value pairs added by the map and reduce functions. Although you never need to explicitly create a KeyValueStore object to usemapreduce, you do need to use the add andaddmulti object functions to interact with this object in the map and reduce functions.

Creation

The mapreduce function automatically createsKeyValueStore objects during execution.

Object Functions

add Add single key-value pair to KeyValueStore
addmulti Add multiple key-value pairs to KeyValueStore

Examples

collapse all

The following map function uses the add function to add key-value pairs one at a time to an intermediateKeyValueStore object (namedintermKVStore).

function MeanDistMapFun(data, info, intermKVStore) distances = data.Distance(~isnan(data.Distance)); sumLenKey = 'sumAndLength'; sumLenValue = [sum(distances), length(distances)]; add(intermKVStore, sumLenKey, sumLenValue); end

The following map function uses addmulti to add several key-value pairs to an intermediate KeyValueStore object (named intermKVStore). Note that this map function collects multiple keys in the intermKeys variable, and multiple values in the intermVals variable. This prepares a single call to addmulti to add all of the key-value pairs at once. It is a best practice to use a single call toaddmulti rather than using add in a loop.

function meanArrivalDelayByDayMapper(data, ~, intermKVStore) % Mapper function for the MeanByGroupMapReduceExample.

% Copyright 2014 The MathWorks, Inc.

% Data is an n-by-2 table: first column is the DayOfWeek and the second % is the ArrDelay. Remove missing values first. delays = data.ArrDelay; day = data.DayOfWeek; notNaN =~isnan(delays); day = day(notNaN); delays = delays(notNaN);

% find the unique days in this chunk [intermKeys,~,idx] = unique(day, 'stable');

% group delays by idx and apply @grpstatsfun function to each group intermVals = accumarray(idx,delays,size(intermKeys),@countsum); addmulti(intermKVStore,intermKeys,intermVals);

function out = countsum(x) n = length(x); % count s = sum(x); % mean out = {[n, s]};

Extended Capabilities

Version History

Introduced in R2014b