matlab.io.datastore.DsFileSet - File-set object for collection of files in datastore - MATLAB (original) (raw)

Namespace: matlab.io.datastore

File-set object for collection of files in datastore

Description

The DsFileSet object helps you manage the iterative processing of large collections of files. Use the DsFileSet object together with the DsFileReader object to manage and read files from your datastore.

Construction

`fs` = matlab.io.datastore.DsFileSet([location](#d126e442634)) returns a DsFileSet object for a collection of files based on the specified location.

`fs` = matlab.io.datastore.DsFileSet([location](#d126e442634),[Name,Value](#namevaluepairarguments)) specifies additional parameters for the DsFileSet object using one or more name-value pair arguments. Name also can be a property name, andValue is the corresponding value. Name must appear inside single quotes (''). You can specify several name-value pair arguments in any order as Name1,Value1,...,NameN,ValueN.

Input Arguments

expand all

location — Files or folders to include

character vector | cell array of character vectors | string | struct

Files or folders to include in the file-set object, specified as a character vector, cell array of character vectors, string, or a struct. If the files are not in the current folder, thenlocation must be full or relative paths. Files within subfolders of the specified folder are not automatically included in the file-set object.

Typically for a Hadoop® workflow, when you specifylocation as a struct, it must contain the fieldsFileName, Offset, andSize. This requirement enables you to use thelocation argument directly with the initializeDatastore method of the matlab.io.datastore.HadoopLocationBased class. For an example, see Add Support for Hadoop.

You can use the wildcard character (*) when specifyinglocation. Specifying this character includes all matching files or all files in the matching folders in the file-set object.

If the files are not available locally, then the full path of the files or folders must be a uniform resource locator (URL), such as
hdfs://_`hostname`_:_`portnumber`_/_`pathtofile`_.

Data Types: char | cell | string | struct

Name-Value Arguments

Specify optional pairs of arguments asName1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter.

Before R2021a, use commas to separate each name and value, and enclose Name in quotes.

Example: 'FileExtensions',{'.jpg','.tif'} includes all files with a .jpg or .tif extension in theFileSet object.

FileExtensions — File extensions

character vector | cell array of character vectors | string

File extensions, specified as the comma-separated pair consisting of 'FileExtensions' and a character vector, cell array of character vectors, or string. You can use the empty quotes'' to represent files without extensions.

If 'FileExtensions' is not specified, thenDsFileSet automatically includes all file extensions.

Example: 'FileExtensions','.jpg'

Example: 'FileExtensions',{'.txt','.csv'}

Data Types: char | cell | string

IncludeSubfolders — Subfolder inclusion flag

false (default) | true

Subfolder inclusion flag, specified as the comma-separated pair consisting of 'IncludeSubfolders' andtrue or false. Specifytrue to include all files and subfolders within each folder or false to include only the files within each folder.

Example: 'IncludeSubfolders',true

Data Types: logical | double

AlternateFileSystemRoots — Alternate file system root paths

string vector | cell array

Alternate file system root paths, specified as the name-value argument consisting of"AlternateFileSystemRoots" and a string vector or a cell array. Use"AlternateFileSystemRoots" when you create a datastore on a local machine, but need to access and process the data on another machine (possibly of a different operating system). Also, when processing data using the Parallel Computing Toolbox™ and the MATLAB® Parallel Server™, and the data is stored on your local machines with a copy of the data available on different platform cloud or cluster machines, you must use"AlternateFileSystemRoots" to associate the root paths.

The value of "AlternateFileSystemRoots" must satisfy these conditions:

For more information, see Set Up Datastore for Processing on Different Machines or Clusters.

Example: ["Z:\datasets","/mynetwork/datasets"]

Data Types: string | cell

Properties

expand all

NumFiles — Number of files

numeric scalar

This property is read-only.

Number of files in the file-set object, specified as a numeric scalar.

Example: fs.NumFiles

Data Types: double

FileSplitSize — Split Size

'file' (default) | numeric scalar

This property is read-only.

Split size, specified as 'file' or a numeric scalar.

The value assigned to FileSplitSize dictates the output from the nextfile method.

Example: 'FileSplitSize',20

Data Types: double | char

Methods

hasfile Determine if more files are available in file-set object
maxpartitions Maximum number of partitions
nextfile Information on next file or file chunk
partition Partition file-set object
subset Create subset of datastore or FileSet
reset Reset the file-set object
resolve Information on all files in file-set object

Examples

collapse all

Get File Information for Collection of Files

Create a file-set object, get file information one file at time, or get information for all the files in the file-set object.

Create a file-set object for all the .mat files from the demos folder.

folder = fullfile(matlabroot,'toolbox','matlab','demos'); fs = matlab.io.datastore.DsFileSet(folder,... 'IncludeSubfolders',true,... 'FileExtensions','.mat');

Obtain information for the first and second file from the file-set object.

fTable1 = nextfile(fs) ; % first file fTable2 = nextfile(fs) ; % second file

Obtain information on all the files by getting information for one file at a time and collect the information into a table.

ft = cell(fs.NumFiles,1); % using cell for efficiency i = 1; reset(fs); % reset to the beginning of the fileset while hasfile(fs)
ft{i} = nextfile(fs); i = i + 1; end allFiles = vertcat(ft{:});

Alternatively, obtain information on all files at the same time.

Tips

Version History

Introduced in R2017b