matlab.io.Datastore - Base datastore class - MATLAB (original) (raw)
Namespace: matlab.io
Description
matlab.io.Datastore
is an abstract class for creating a custom datastore. A datastore helps access large collections of data iteratively, especially when data is too large to fit in memory. The Datastore
abstract class declares and captures the interface expected for all custom datastores in MATLAB®. Derive your class using this syntax:
classdef MyDatastore < matlab.io.Datastore ... end
To implement your custom datastore:
- Inherit from the class matlab.io.Datastore
- Define the four required methods:
hasdata
,read
,reset
, andprogress
For more details and steps to create your custom datastore, see Develop Custom Datastore.
Methods
read | Read data from the datastore.[data,info] = read(ds)The data output can be any data type and must be vertically concatenateable. Best practice is to return the info output as a structure. The data type of the outputdata dictates the data type of the output of the tall function.Access: Public, Abstract: true |
---|---|
hasdata | Determine if data is available to read. The output is of type logical.tf = hasdata(ds)Access: Public, Abstract: true |
reset | Reset the datastore to an initial state before any data is read.reset(ds)Access: Public, Abstract: true |
progress | Determine how much data is already read.The output is a scalar double between 0 and1. A return value of 0.55 means that you have read 55% of the data.p = progress(ds)Access: Public, Abstract: true,Hidden:true |
preview | Return a subset of the data.data = preview(ds)The default implementation returns the first eight rows of data. The output has the same data type as the output of read.The default implementation of the preview method is not optimized for tall array construction. For improved tall array performance, optimize your implementation based on your data.Access: Public |
readall | Read all data in the datastore. data = readall(ds)The output has the same data type as the output of read. If the data does not fit in memory, readall returns an error.The default implementation of thereadall method is not optimized for tall array construction. For improved tall array performance, optimize your implementation based on your data.Access: Public |
combine | Combine data from multiple datastores.dsnew = combine(ds1,ds2,...,dsN)The outputdsnew is a new datastore with combined data, returned as a CombinedDatastore object. Use theReadOrder="sequential" name-value argument to return a SequentialDatastore object that reads data sequentially. Access: Public |
transform | Transform the datastore.dsnew = transform(ds,@fcn)The outputdsnew is a new datastore with transformed data, returned as a TransformedDatastore object.Access: Public |
isPartitionable | Determine whether datastore is partitionable. The output is of type logical.tf = isPartitionable(ds)Access: Public |
isSubsettable | Determine whether datastore is subsettable. The output is of type logical.tf = isSubsettable(ds)Access: Public |
isShuffleable | Determine whether datastore is shuffleable. The output is of type logical.tf = isShuffleable(ds)Access: Public |
Properties
To add handle properties to your custom datastore, you must implement thecopyElement
method. For example, if you use theDsFileSet
object as a property in your custom datastore, then implement the copyElement
method. Implementing thecopyElement
method enables you to create a deep copy of the datastore object. For more information, see Customize Copy Operation. For an example implementation of the copyElement
method, see Develop Custom Datastore.
Examples
Build Datastore to Read Binary Files
Build a datastore to bring your custom or proprietary data into MATLAB® for serial processing.
Create a .m
class definition file that contains the code implementing your custom datastore. You must save this file in your working folder or in a folder that is on the MATLAB® path. The name of the .m
file must be the same as the name of your object constructor function. For example, if you want your constructor function to have the name MyDatastore, then the name of the .m
file must be MyDatastore.m
. The .m
class definition file must contain the following steps:
- Step 1: Inherit from the datastore classes.
- Step 2: Define the constructor and the required methods.
- Step 3: Define your custom file reading function.
In addition to these steps, define any other properties or methods that you need to process and analyze your data.
%% STEP 1: INHERIT FROM DATASTORE CLASSES classdef MyDatastore < matlab.io.Datastore
properties(Access = private)
CurrentFileIndex double
FileSet matlab.io.datastore.DsFileSet
end
% Property to support saving, loading, and processing of
% datastore on different file system machines or clusters.
% In addition, define the methods get.AlternateFileSystemRoots()
% and set.AlternateFileSystemRoots() in the methods section.
properties(Dependent)
AlternateFileSystemRoots
end
%% STEP 2: DEFINE THE CONSTRUCTOR AND THE REQUIRED METHODS methods % Define your datastore constructor function myds = MyDatastore(location,altRoots) myds.FileSet = matlab.io.datastore.DsFileSet(location,... 'FileExtensions','.bin', ... 'FileSplitSize',8*1024); myds.CurrentFileIndex = 1;
if nargin == 2
myds.AlternateFileSystemRoots = altRoots;
end
reset(myds);
end
% Define the hasdata method
function tf = hasdata(myds)
% Return true if more data is available
tf = hasfile(myds.FileSet);
end
% Define the read method
function [data,info] = read(myds)
% Read data and information about the extracted data
% See also: MyFileReader()
if ~hasdata(myds)
error(sprintf(['No more data to read.\nUse the reset ',...
'method to reset the datastore to the start of ' ,...
'the data. \nBefore calling the read method, ',...
'check if data is available to read ',...
'by using the hasdata method.']))
end
fileInfoTbl = nextfile(myds.FileSet);
data = MyFileReader(fileInfoTbl);
info.Size = size(data);
info.FileName = fileInfoTbl.FileName;
info.Offset = fileInfoTbl.Offset;
% Update CurrentFileIndex for tracking progress
if fileInfoTbl.Offset + fileInfoTbl.SplitSize >= ...
fileInfoTbl.FileSize
myds.CurrentFileIndex = myds.CurrentFileIndex + 1 ;
end
end
% Define the reset method
function reset(myds)
% Reset to the start of the data
reset(myds.FileSet);
myds.CurrentFileIndex = 1;
end
% Getter for AlternateFileSystemRoots property
function altRoots = get.AlternateFileSystemRoots(myds)
altRoots = myds.FileSet.AlternateFileSystemRoots;
end
% Setter for AlternateFileSystemRoots property
function set.AlternateFileSystemRoots(myds,altRoots)
try
% The DsFileSet object manages the AlternateFileSystemRoots
% for your datastore
myds.FileSet.AlternateFileSystemRoots = altRoots;
% Reset the datastore
reset(myds);
catch ME
throw(ME);
end
end
end
methods (Hidden = true)
% Define the progress method
function frac = progress(myds)
% Determine percentage of data read from datastore
if hasdata(myds)
frac = (myds.CurrentFileIndex-1)/...
myds.FileSet.NumFiles;
else
frac = 1;
end
end
end
methods(Access = protected)
% If you use the FileSet property in the datastore,
% then you must define the copyElement method. The
% copyElement method allows methods such as readall
% and preview to remain stateless
function dscopy = copyElement(ds)
dscopy = copyElement@matlab.mixin.Copyable(ds);
dscopy.FileSet = copy(ds.FileSet);
end
end
end
%% STEP 3: IMPLEMENT YOUR CUSTOM FILE READING FUNCTION function data = MyFileReader(fileInfoTbl) % create a reader object using FileName reader = matlab.io.datastore.DsFileReader(fileInfoTbl.FileName);
% seek to the offset seek(reader,fileInfoTbl.Offset,'Origin','start-of-file');
% read fileInfoTbl.SplitSize amount of data data = read(reader,fileInfoTbl.SplitSize);
end
Your custom datastore is now ready. Use MyDatastore
to create a datastore object for reading your binary data files.
Create Datastore Object Using Custom Datastore And Read Data
Use custom datastore to preview and read your proprietary data into MATLAB for serial processing.
This example uses a simple data set to illustrate a workflow using your custom datastore. The data set is a collection of 15 binary (.bin
) files where each file contains a column (1
variable) and 10000
rows (records) of unsigned integers.
binary_data01.bin binary_data02.bin binary_data03.bin binary_data04.bin binary_data05.bin binary_data06.bin binary_data07.bin binary_data08.bin binary_data09.bin binary_data10.bin binary_data11.bin binary_data12.bin binary_data13.bin binary_data14.bin binary_data15.bin
Create a datastore object using the MyDatastore
function. For implementation details of MyDatastore
, see the example Build Datastore to Read Binary Files.
folder = fullfile('*.bin'); ds = MyDatastore(folder);
Preview the data from the datastore.
ans = 8x1 uint8 column vector
113 180 251 91 29 66 254 214
Read the data in a while
loop and use the hasdata
method to check if more data is available to read.
while hasdata(ds) data = read(ds); % do something end
Reset the datastore to its initial state and read the data from the start of the datastore.
reset(ds); data = read(ds);
Alternatively, if your data collection fits in memory, then read all the data in the datastore. Since the folder contains 15
files with 10000
records in each file, the size of the output should be 150000
records.
dataAll = readall(ds); whos dataAll
Name Size Bytes Class Attributes
dataAll 150000x1 150000 uint8
Save and Load Datastore on Different Platforms
Create custom datastore object, save it on a Windows® machine, and then load and process it on a Linux® machine.
Before creating and saving your custom datastore, identify the root path of your data on the different platforms. The root paths differ based on the machine or file system. For example, if you access the data using these root paths:
"Z:\DataSet"
on your local Windows machine"/nfs-bldg001/DataSet"
on your Linux cluster
Then, associate these root paths using theAlternateFileSystemRoots
property. For implementation details of MyDatastore
, see the example Build Datastore to Read Binary Files
.
altRoots = ["Z:\DataSet","/nfs-bldg001/DataSet"]; ds = MyDatastore('Z:\DataSet*.bin',altRoots);
Examine the files in the datastore.
fileTbl = resolve(ds.Fileset); fileTbl.FileName
ans =
12×1 cell array
{'Z:\DataSet\binary_data01.bin'}
{'Z:\DataSet\binary_data02.bin'}
{'Z:\DataSet\binary_data03.bin'}
.
.
.
Save the datastore.
save ds_saved_on_Windows.mat ds
Load the datastore on a Linux platform and examine the files in the datastore. Since the root path 'Z:\DataSet'
is not accessible on the Linux cluster at load time, the datastore function automatically updates the root paths based on the values specified in theAlternateFileSystemRoots
property.
load ds_saved_on_Windows.mat fileTbl = resolve(ds.Fileset); fileTbl.FileName
ans =
12×1 cell array
{'/nfs-bldg001/DataSet/binary_data01.bin'}
{'/nfs-bldg001/DataSet/binary_data02.bin'}
{'/nfs-bldg001/DataSet/binary_data03.bin'}
.
.
.
You can now process and analyze this datastore on your Linux machine.
Version History
Introduced in R2017b