Debug MapReduce Algorithms - MATLAB & Simulink (original) (raw)
This example shows how to debug your mapreduce
algorithms in MATLAB® using a simple example file, MaxMapReduceExample.m
. Debugging enables you to follow the movement of data between the different phases ofmapreduce
execution and inspect the state of all intermediate variables.
To begin, save the contents of the following functions as separate files in your current directory.
%% Find Maximum Value with MapReduce % This example shows how to find the maximum value of a single variable in % a data set using |mapreduce|. It demonstrates the simplest use of % |mapreduce| since there is only one key and minimal computation.
% Copyright 1984-2014 The MathWorks, Inc. %% Prepare Data % Create a datastore using the |airlinesmall.csv| data set. This 12 % megabyte data set contains 29 columns of flight information for several % airline carriers, including arrival and departure times. In this example, % select |ArrDelay| (flight arrival delay) as the variable of interest. ds = tabularTextDatastore('airlinesmall.csv', 'TreatAsMissing', 'NA'); ds.SelectedVariableNames = 'ArrDelay'
%% % |tabularTextDatastore| returns a |TabularTextDatastore| object for the data. This % datastore treats |'NA'| strings as missing, and replaces the missing % values with |NaN| values by default. Additionally, the % |SelectedVariableNames| property allows you to work with only the % selected variable of interest, which you can verify using |preview|. preview(ds)
%% Run MapReduce % The |mapreduce| function requires a mapper function and a reducer % function. The mapper function receives chunks of data and outputs % intermediate results. The reducer function reads the intermediate results % and produces a final result.
%% % In this example, the mapper function finds the maximum arrival delay in % each chunk of data. The mapper function then stores these maximum values % as the intermediate values associated with the key % |'PartialMaxArrivalDelay'|.
%% % Display the mapper function file. type maxArrivalDelayMapper.m
%% % The reducer function receives a list of the maximum arrival delays for % each chunk and finds the overall maximum arrival delay from the list of % values. |mapreduce| only calls this reducer function once, since the % mapper function only adds a single unique key. The reducer function uses % |add| to add a final key-value pair to the output.
%% % Display the reducer function file. type maxArrivalDelayReducer.m
%% % Use |mapreduce| to apply the mapper and reducer functions to the % datastore, |ds|. maxDelay = mapreduce(ds, @maxArrivalDelayMapper, @maxArrivalDelayReducer);
%% % |mapreduce| returns a datastore, |maxDelay|, with files in the % current folder.
%% % Read the final result from the output datastore, |maxDelay|. readall(maxDelay)
function maxArrivalDelayMapper (data, info, intermKVStore) % Mapper function for the MaxMapreduceExample.
% Copyright 1984-2014 The MathWorks, Inc.
% Data is an n-by-1 table of the ArrDelay. As the data source is tabular, % the return of read is a table object. partMax = max(data.ArrDelay); add(intermKVStore, 'PartialMaxArrivalDelay',partMax);
function maxArrivalDelayReducer(intermKey, intermValIter, outKVStore) % Reducer function for the MaxMapreduceExample.
% Copyright 2014 The MathWorks, Inc.
% intermKey is 'PartialMaxArrivalDelay'. intermValIter is an iterator of % all values that has the key 'PartialMaxArrivalDelay'. maxVal = -inf; while hasnext(intermValIter) maxVal = max(getnext(intermValIter), maxVal); end % The key-value pair added to outKVStore will become the output of mapreduce add(outKVStore,'MaxArrivalDelay',maxVal);
Set Breakpoint
Set one or more breakpoints in your map or reduce function files so you can examine the variable values where you think the problem is. For more information, see Set Breakpoints.
Open the file maxArrivalDelayMapper.m
.
edit maxArrivalDelayMapper.m
Set a breakpoint on line 9. This breakpoint causes execution ofmapreduce
to pause right before each call to the map function adds a key-value pair to the intermediate KeyValueStore
object, namedintermKVStore
.
Execute mapreduce
Run the mapreduce
example fileMaxMapReduceExample.m
. Specify mapreducer(0)
to ensure that the algorithm does not run in parallel, since parallel execution ofmapreduce
using Parallel Computing Toolbox™ ignores breakpoints.
mapreducer(0); MaxMapReduceExample
MATLAB stops execution of the file when it encounters the breakpoint in the map function. During the pause in execution, you can hover over the different variable names in the map function, or type one of the variable names at the command line to inspect the values.
In this case, the display indicates that, as yet, there are no key-value pairs inintermKVStore
.
Step Through Map Function
- Continue past the breakpoint. You can use
dbstep
to execute a single line, ordbcont
to continue execution until MATLAB encounters another breakpoint. Alternatively, you can clickStep or
Continue in the Editor tab. For more information about all the available options, see Debug MATLAB Code Files.
In this case, usedbstep
(or clickStep) to execute only line 9, which adds a key-value pair to
intermKVStore
. Inspect the new display forintermKVStore
. - Now, use
dbcont
(or clickContinue) to continue execution of
mapreduce
. During the next call to the map function, MATLAB halts again on line 9. The new display forintermKVStore
indicates that it does not contain any key-value pairs, because the display is meant to show only the most recent key-value pairs that are added in the current call to the map (or reduce) function. - Step past line 9 again using
dbstep
(or clickStep) to add the next key-value pair to
intermKVStore
, and inspect the new display for the variable. MATLAB displays only the key-value pair added during the current call to the map function. - Complete the debugging of the map function by removing the breakpoint and closing the file
maxArrivalDelayMapper.m
.
Step Through Reduce Function
- You can use the same process to set breakpoints and step through execution of a reduce function. The reduce function for this example is
maxArrivalDelayReducer.m
. Open this file for editing.
edit maxArrivalDelayReducer.m - Set two breakpoints: one on line 10, and one on line 13. This enables you to inspect the
ValueIterator
and the final key-value pairs added to the output,outKVStore
. - Run the main example file.
- The execution of the example will pause when the breakpoint on line 10 is encountered. The debug display for the
ValueIterator
indicates the active key and whether any values remain to be retrieved. - Now, remove the breakpoint on line 10 and use
dbcont
(or clickContinue) to continue execution of the example until the next breakpoint is reached (on line 13). Since this reduce function continually compares each new value from the
ValueIterator
to the global maximum,mapreduce
execution ends by adding a single key-value pair tooutKVStore
. - Use
dbstep
(or clickStep) to execute line 13 only. The display for
outKVStore
shows the global maximum value thatmapreduce
will return as the final answer. - Now use
dbcont
(or clickContinue) to advance execution, enabling the example to finish running.
mapreduce
returns the final results.
Map 100% Reduce 100%
ans =
Key Value
'MaxArrivalDelay' [1014]
For a complete guide to debugging in MATLAB, see Debugging and Analysis.