Debug MapReduce Algorithms - MATLAB & Simulink (original) (raw)

This example shows how to debug your mapreduce algorithms in MATLAB® using a simple example file, MaxMapReduceExample.m. Debugging enables you to follow the movement of data between the different phases ofmapreduce execution and inspect the state of all intermediate variables.

To begin, save the contents of the following functions as separate files in your current directory.

MaxMapReduceExample.m

%% Find Maximum Value with MapReduce % This example shows how to find the maximum value of a single variable in % a data set using |mapreduce|. It demonstrates the simplest use of % |mapreduce| since there is only one key and minimal computation.

% Copyright 1984-2014 The MathWorks, Inc. %% Prepare Data % Create a datastore using the |airlinesmall.csv| data set. This 12 % megabyte data set contains 29 columns of flight information for several % airline carriers, including arrival and departure times. In this example, % select |ArrDelay| (flight arrival delay) as the variable of interest. ds = tabularTextDatastore('airlinesmall.csv', 'TreatAsMissing', 'NA'); ds.SelectedVariableNames = 'ArrDelay'

%% % |tabularTextDatastore| returns a |TabularTextDatastore| object for the data. This % datastore treats |'NA'| strings as missing, and replaces the missing % values with |NaN| values by default. Additionally, the % |SelectedVariableNames| property allows you to work with only the % selected variable of interest, which you can verify using |preview|. preview(ds)

%% Run MapReduce % The |mapreduce| function requires a mapper function and a reducer % function. The mapper function receives chunks of data and outputs % intermediate results. The reducer function reads the intermediate results % and produces a final result.

%% % In this example, the mapper function finds the maximum arrival delay in % each chunk of data. The mapper function then stores these maximum values % as the intermediate values associated with the key % |'PartialMaxArrivalDelay'|.

%% % Display the mapper function file. type maxArrivalDelayMapper.m

%% % The reducer function receives a list of the maximum arrival delays for % each chunk and finds the overall maximum arrival delay from the list of % values. |mapreduce| only calls this reducer function once, since the % mapper function only adds a single unique key. The reducer function uses % |add| to add a final key-value pair to the output.

%% % Display the reducer function file. type maxArrivalDelayReducer.m

%% % Use |mapreduce| to apply the mapper and reducer functions to the % datastore, |ds|. maxDelay = mapreduce(ds, @maxArrivalDelayMapper, @maxArrivalDelayReducer);

%% % |mapreduce| returns a datastore, |maxDelay|, with files in the % current folder.

%% % Read the final result from the output datastore, |maxDelay|. readall(maxDelay)

maxArrivalDelayMapper.m

function maxArrivalDelayMapper (data, info, intermKVStore) % Mapper function for the MaxMapreduceExample.

% Copyright 1984-2014 The MathWorks, Inc.

% Data is an n-by-1 table of the ArrDelay. As the data source is tabular, % the return of read is a table object. partMax = max(data.ArrDelay); add(intermKVStore, 'PartialMaxArrivalDelay',partMax);

maxArrivalDelayReducer.m

function maxArrivalDelayReducer(intermKey, intermValIter, outKVStore) % Reducer function for the MaxMapreduceExample.

% Copyright 2014 The MathWorks, Inc.

% intermKey is 'PartialMaxArrivalDelay'. intermValIter is an iterator of % all values that has the key 'PartialMaxArrivalDelay'. maxVal = -inf; while hasnext(intermValIter) maxVal = max(getnext(intermValIter), maxVal); end % The key-value pair added to outKVStore will become the output of mapreduce add(outKVStore,'MaxArrivalDelay',maxVal);

Set Breakpoint

Set one or more breakpoints in your map or reduce function files so you can examine the variable values where you think the problem is. For more information, see Set Breakpoints.

Open the file maxArrivalDelayMapper.m.

edit maxArrivalDelayMapper.m

Set a breakpoint on line 9. This breakpoint causes execution ofmapreduce to pause right before each call to the map function adds a key-value pair to the intermediate KeyValueStore object, namedintermKVStore.

Screenshot of map function file with breakpoint added on line that calls the add method.

Execute mapreduce

Run the mapreduce example fileMaxMapReduceExample.m. Specify mapreducer(0) to ensure that the algorithm does not run in parallel, since parallel execution ofmapreduce using Parallel Computing Toolbox™ ignores breakpoints.

mapreducer(0); MaxMapReduceExample

MATLAB stops execution of the file when it encounters the breakpoint in the map function. During the pause in execution, you can hover over the different variable names in the map function, or type one of the variable names at the command line to inspect the values.

In this case, the display indicates that, as yet, there are no key-value pairs inintermKVStore.

Screenshot of hover text showing KeyValueStore with no key-value pairs.

Step Through Map Function

  1. Continue past the breakpoint. You can use dbstep to execute a single line, or dbcont to continue execution until MATLAB encounters another breakpoint. Alternatively, you can click Step or Continue in the Editor tab. For more information about all the available options, see Debug MATLAB Code Files.
    In this case, use dbstep (or click Step) to execute only line 9, which adds a key-value pair tointermKVStore. Inspect the new display forintermKVStore.
    Screenshot of hover text showing KeyValueStore with one key-value pair.
  2. Now, use dbcont (or click Continue) to continue execution ofmapreduce. During the next call to the map function, MATLAB halts again on line 9. The new display forintermKVStore indicates that it does not contain any key-value pairs, because the display is meant to show only the most recent key-value pairs that are added in the current call to the map (or reduce) function.
  3. Step past line 9 again using dbstep (or click Step) to add the next key-value pair tointermKVStore, and inspect the new display for the variable. MATLAB displays only the key-value pair added during the current call to the map function.
    Screenshot of hover text showing KeyValueStore with multiple key-value pairs. The display shows the last key-value pair added.
  4. Complete the debugging of the map function by removing the breakpoint and closing the file maxArrivalDelayMapper.m.

Step Through Reduce Function

  1. You can use the same process to set breakpoints and step through execution of a reduce function. The reduce function for this example ismaxArrivalDelayReducer.m. Open this file for editing.
    edit maxArrivalDelayReducer.m
  2. Set two breakpoints: one on line 10, and one on line 13. This enables you to inspect the ValueIterator and the final key-value pairs added to the output, outKVStore.
  3. Run the main example file.
  4. The execution of the example will pause when the breakpoint on line 10 is encountered. The debug display for the ValueIterator indicates the active key and whether any values remain to be retrieved.
    Screenshot of hover text showing ValueIterator with an active key and one or more values available.
  5. Now, remove the breakpoint on line 10 and use dbcont (or click Continue) to continue execution of the example until the next breakpoint is reached (on line 13). Since this reduce function continually compares each new value from the ValueIterator to the global maximum,mapreduce execution ends by adding a single key-value pair tooutKVStore.
  6. Use dbstep (or click Step) to execute line 13 only. The display foroutKVStore shows the global maximum value thatmapreduce will return as the final answer.
    Screenshot of hover text showing KeyValueStore with one key-value pair added.
  7. Now use dbcont (or click Continue) to advance execution, enabling the example to finish running. mapreduce returns the final results.
    Map 100% Reduce 100%
    ans =
    Key Value
    'MaxArrivalDelay' [1014]

For a complete guide to debugging in MATLAB, see Debugging and Analysis.

See Also

mapreduce