matlab.io.datastore.DsFileSet - File-set object for collection of files in datastore - MATLAB (original) (raw)
Namespace: matlab.io.datastore
File-set object for collection of files in datastore
Description
The DsFileSet
object helps you manage the iterative processing of large collections of files. Use the DsFileSet
object together with the DsFileReader object to manage and read files from your datastore.
Construction
`fs` = matlab.io.datastore.DsFileSet([location](#d126e442634))
returns a DsFileSet
object for a collection of files based on the specified location
.
`fs` = matlab.io.datastore.DsFileSet([location](#d126e442634),[Name,Value](#namevaluepairarguments))
specifies additional parameters for the DsFileSet
object using one or more name-value pair arguments. Name
also can be a property name, andValue
is the corresponding value. Name
must appear inside single quotes (''
). You can specify several name-value pair arguments in any order as Name1,Value1,...,NameN,ValueN
.
Input Arguments
location
— Files or folders to include
character vector | cell array of character vectors | string | struct
Files or folders to include in the file-set object, specified as a character vector, cell array of character vectors, string, or a struct. If the files are not in the current folder, thenlocation
must be full or relative paths. Files within subfolders of the specified folder are not automatically included in the file-set object.
Typically for a Hadoop® workflow, when you specifylocation
as a struct, it must contain the fieldsFileName
, Offset
, andSize
. This requirement enables you to use thelocation
argument directly with the initializeDatastore method of the matlab.io.datastore.HadoopLocationBased class. For an example, see Add Support for Hadoop.
You can use the wildcard character (*) when specifyinglocation
. Specifying this character includes all matching files or all files in the matching folders in the file-set object.
If the files are not available locally, then the full path of the files or folders must be a uniform resource locator (URL), such ashdfs://_`hostname`_:_`portnumber`_/_`pathtofile`_
.
Data Types: char
| cell
| string
| struct
Name-Value Arguments
Specify optional pairs of arguments asName1=Value1,...,NameN=ValueN
, where Name
is the argument name and Value
is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter.
Before R2021a, use commas to separate each name and value, and enclose Name
in quotes.
Example: 'FileExtensions',{'.jpg','.tif'}
includes all files with a .jpg
or .tif
extension in theFileSet
object.
FileExtensions
— File extensions
character vector | cell array of character vectors | string
File extensions, specified as the comma-separated pair consisting of 'FileExtensions'
and a character vector, cell array of character vectors, or string. You can use the empty quotes''
to represent files without extensions.
If 'FileExtensions'
is not specified, thenDsFileSet
automatically includes all file extensions.
Example: 'FileExtensions','.jpg'
Example: 'FileExtensions',{'.txt','.csv'}
Data Types: char
| cell
| string
IncludeSubfolders
— Subfolder inclusion flag
false
(default) | true
Subfolder inclusion flag, specified as the comma-separated pair consisting of 'IncludeSubfolders'
andtrue
or false
. Specifytrue
to include all files and subfolders within each folder or false
to include only the files within each folder.
Example: 'IncludeSubfolders',true
Data Types: logical
| double
AlternateFileSystemRoots
— Alternate file system root paths
string vector | cell array
Alternate file system root paths, specified as the name-value argument consisting of"AlternateFileSystemRoots"
and a string vector or a cell array. Use"AlternateFileSystemRoots"
when you create a datastore on a local machine, but need to access and process the data on another machine (possibly of a different operating system). Also, when processing data using the Parallel Computing Toolbox™ and the MATLAB® Parallel Server™, and the data is stored on your local machines with a copy of the data available on different platform cloud or cluster machines, you must use"AlternateFileSystemRoots"
to associate the root paths.
- To associate a set of root paths that are equivalent to one another, specify
"AlternateFileSystemRoots"
as a string vector. For example,
["Z:\datasets","/mynetwork/datasets"] - To associate multiple sets of root paths that are equivalent for the datastore, specify
"AlternateFileSystemRoots"
as a cell array containing multiple rows where each row represents a set of equivalent root paths. Specify each row in the cell array as either a string vector or a cell array of character vectors. For example:- Specify
"AlternateFileSystemRoots"
as a cell array of string vectors.
{["Z:\datasets", "/mynetwork/datasets"];...
["Y:\datasets", "/mynetwork2/datasets","S:\datasets"]} - Alternatively, specify
"AlternateFileSystemRoots"
as a cell array of cell array of character vectors.
{{'Z:\datasets','/mynetwork/datasets'};...
{'Y:\datasets', '/mynetwork2/datasets','S:\datasets'}}
- Specify
The value of "AlternateFileSystemRoots"
must satisfy these conditions:
- Contains one or more rows, where each row specifies a set of equivalent root paths.
- Each row specifies multiple root paths and each root path must contain at least two characters.
- Root paths are unique and are not subfolders of one another.
- Contains at least one root path entry that points to the location of the files.
For more information, see Set Up Datastore for Processing on Different Machines or Clusters.
Example: ["Z:\datasets","/mynetwork/datasets"]
Data Types: string
| cell
Properties
NumFiles
— Number of files
numeric scalar
This property is read-only.
Number of files in the file-set object, specified as a numeric scalar.
Example: fs.NumFiles
Data Types: double
FileSplitSize
— Split Size
'file'
(default) | numeric scalar
This property is read-only.
Split size, specified as 'file'
or a numeric scalar.
The value assigned to FileSplitSize
dictates the output from the nextfile method.
- If
FileSplitSize
is'file'
, then thenextfile
method returns a table withFileName
,FileSize
,Offset
, andSplitSize
. The value ofSplitSize
is set equal to theFileSize
. - If
FileSplitSize
is a numeric scalarn
, then thenextfile
method returnsFileName
,FileSize
,Offset
, andSplitSize
. The value ofSplitSize
is set equal to theFileSplitSize
. This information is used to readn
bytes of the file. Subsequent calls to thenextfile
method return information to help read the nextn
bytes of the same file until the end of the file.
Example: 'FileSplitSize',20
Data Types: double
| char
Methods
hasfile | Determine if more files are available in file-set object |
---|---|
maxpartitions | Maximum number of partitions |
nextfile | Information on next file or file chunk |
partition | Partition file-set object |
subset | Create subset of datastore or FileSet |
reset | Reset the file-set object |
resolve | Information on all files in file-set object |
Examples
Get File Information for Collection of Files
Create a file-set object, get file information one file at time, or get information for all the files in the file-set object.
Create a file-set object for all the .mat
files from the demos
folder.
folder = fullfile(matlabroot,'toolbox','matlab','demos'); fs = matlab.io.datastore.DsFileSet(folder,... 'IncludeSubfolders',true,... 'FileExtensions','.mat');
Obtain information for the first and second file from the file-set object.
fTable1 = nextfile(fs) ; % first file fTable2 = nextfile(fs) ; % second file
Obtain information on all the files by getting information for one file at a time and collect the information into a table.
ft = cell(fs.NumFiles,1); % using cell for efficiency
i = 1;
reset(fs); % reset to the beginning of the fileset
while hasfile(fs)
ft{i} = nextfile(fs);
i = i + 1;
end
allFiles = vertcat(ft{:});
Alternatively, obtain information on all files at the same time.
Tips
- If you use the
DsFileSet
object as a property in your custom datastore, then implement thecopyElement
method. Implementing thecopyElement
method enables you to create a deep copy of the datastore object. For more information, see Customize Copy Operation. For an example implementation of thecopyElement
method, see Develop Custom Datastore.
Version History
Introduced in R2017b