matlab.io.datastore.Partitionable.partition - Partition a datastore - MATLAB (original) (raw)

Class: matlab.io.datastore.Partitionable
Namespace: matlab.io.datastore

Syntax

Description

`subds` = partition([ds](#d126e782674),[n](#d126e782706),[index](#d126e782787)) partitions datastore ds into the number of parts specified byn and returns the partition corresponding to the indexindex. The partitioned datastore subds is of the same type as the input datastore ds.

example

Input Arguments

expand all

ds — Input datastore

matlab.io.Datastore object

Input datastore, specified as a matlab.io.Datastore object. To create a Datastore object, see matlab.io.Datastore.

n — Number of partitions

positive integer

Number of partitions, specified as a positive integer. To get a reasonable value for n, use the numpartitions function.

When you specify a value of n that is not in the range of partitions available for the datastore, the partition method returns an empty datastore. For more information, see Empty Datastores. For instance, if a datastore can hold up to 10 partitions, then the output of the partition method depends on the value of n.

Example: 3

Data Types: double

index — Index

positive integer

Index, specified as a positive integer.

Example: 1

Data Types: double

Examples

expand all

Build Datastore with Parallel Processing Support

Build a datastore with parallel processing support and use it to bring your custom or proprietary data into MATLAB®. Then, process the data in a parallel pool.

Create a .m class definition file that contains the code implementing your custom datastore. You must save this file in your working folder or in a folder that is on the MATLAB® path. The name of the .m file must be the same as the name of your object constructor function. For example, if you want your constructor function to have the name MyDatastorePar, then the name of the .m file must be MyDatastorePar.m. The .m class definition file must contain the following steps:

In addition to these steps, define any other properties or methods that you need to process and analyze your data.

%% STEP 1: INHERIT FROM DATASTORE CLASSES classdef MyDatastorePar < matlab.io.Datastore & ... matlab.io.datastore.Partitionable

properties(Access = private)
    CurrentFileIndex double
    FileSet matlab.io.datastore.DsFileSet
end

% Property to support saving, loading, and processing of
% datastore on different file system machines or clusters.
% In addition, define the methods get.AlternateFileSystemRoots()
% and set.AlternateFileSystemRoots() in the methods section. 
properties(Dependent)
    AlternateFileSystemRoots
end

%% STEP 2: DEFINE THE CONSTRUCTOR AND THE REQUIRED METHODS methods % Define your datastore constructor function myds = MyDatastorePar(location,altRoots) myds.FileSet = matlab.io.datastore.DsFileSet(location,... 'FileExtensions','.bin', ... 'FileSplitSize',8*1024); myds.CurrentFileIndex = 1;

        if nargin == 2
             myds.AlternateFileSystemRoots = altRoots;
        end
        
        reset(myds);
    end
    
    % Define the hasdata method
    function tf = hasdata(myds)
        % Return true if more data is available
        tf = hasfile(myds.FileSet);
    end
    
    % Define the read method
    function [data,info] = read(myds)
        % Read data and information about the extracted data
        % See also: MyFileReader()
        if ~hasdata(myds)
            msgII = ['Use the reset method to reset the datastore ',... 
                     'to the start of the data.']; 
            msgIII = ['Before calling the read method, ',...
                      'check if data is available to read ',...
                      'by using the hasdata method.'];
            error('No more data to read.\n%s\n%s',msgII,msgIII);
        end
        
        fileInfoTbl = nextfile(myds.FileSet);
        data = MyFileReader(fileInfoTbl);
        info.Size = size(data);
        info.FileName = fileInfoTbl.FileName;
        info.Offset = fileInfoTbl.Offset;
        
        % Update CurrentFileIndex for tracking progress
        if fileInfoTbl.Offset + fileInfoTbl.SplitSize >= ...
                fileInfoTbl.FileSize
            myds.CurrentFileIndex = myds.CurrentFileIndex + 1 ;
        end
    end
    
    % Define the reset method
    function reset(myds)
        % Reset to the start of the data
        reset(myds.FileSet);
        myds.CurrentFileIndex = 1;
    end

    % Define the partition method
    function subds = partition(myds,n,ii)
        subds = copy(myds);
        subds.FileSet = partition(myds.FileSet,n,ii);
        reset(subds);
    end
    
    % Getter for AlternateFileSystemRoots property
    function altRoots = get.AlternateFileSystemRoots(myds)
        altRoots = myds.FileSet.AlternateFileSystemRoots;
    end

    % Setter for AlternateFileSystemRoots property
    function set.AlternateFileSystemRoots(myds,altRoots)
        try
          % The DsFileSet object manages AlternateFileSystemRoots
          % for your datastore
          myds.FileSet.AlternateFileSystemRoots = altRoots;

          % Reset the datastore
          reset(myds);  
        catch ME
          throw(ME);
        end
    end
  
end

methods (Hidden = true)          
    % Define the progress method
    function frac = progress(myds)
        % Determine percentage of data read from datastore
        if hasdata(myds) 
           frac = (myds.CurrentFileIndex-1)/...
                         myds.FileSet.NumFiles; 
        else 
           frac = 1;  
        end 
    end
end

methods(Access = protected)
    % If you use the  FileSet property in the datastore,
    % then you must define the copyElement method. The
    % copyElement method allows methods such as readall
    % and preview to remain stateless 
    function dscopy = copyElement(ds)
        dscopy = copyElement@matlab.mixin.Copyable(ds);
        dscopy.FileSet = copy(ds.FileSet);
    end
    
    % Define the maxpartitions method
    function n = maxpartitions(myds)
        n = maxpartitions(myds.FileSet);
    end
end

end

%% STEP 3: IMPLEMENT YOUR CUSTOM FILE READING FUNCTION function data = MyFileReader(fileInfoTbl) % create a reader object using FileName reader = matlab.io.datastore.DsFileReader(fileInfoTbl.FileName);

% seek to the offset seek(reader,fileInfoTbl.Offset,'Origin','start-of-file');

% read fileInfoTbl.SplitSize amount of data data = read(reader,fileInfoTbl.SplitSize);

end

Your custom datastore is now ready. Use your custom datastore to read and process the data in a parallel pool.

More About

expand all

Empty Datastores

An empty datastore is a datastore object that does not contain any records. For an empty datastore, your custom datastore methods must satisfy these conditions:

Non-Tall Dimensions

Dimensions other than the first dimension of the array. For an array of size5-by-15-by-25, the tall dimension is 5 and the non-tall dimensions are15 and 25.

Tips

Version History

Introduced in R2017b