TabularTextDatastore - Datastore for tabular text files - MATLAB (original) (raw)
Datastore for tabular text files
Description
Use a TabularTextDatastore
object to manage large collections of text files containing column-oriented or tabular data where the collection does not necessarily fit in memory. Tabular data is data that is arranged in a rectangular fashion with each row having the same number of entries. You can create aTabularTextDatastore
object using thetabularTextDatastore
function, specify its properties, and then import and process the data using object functions.
Creation
Syntax
Description
`ttds` = tabularTextDatastore([location](#d126e350101))
creates a datastore from the collection of data specified bylocation
.
`ttds` = tabularTextDatastore([location](#d126e350101),[Name,Value](#namevaluepairarguments))
specifies additional parameters and properties for ttds
using one or more name-value pair arguments. For example,tabularTextDatastore(location,"FileExtensions",[".txt",".csv"])
creates a datastore from only the files in location
with extensions .txt
and .csv
.
Input Arguments
location
— Files or folders included in datastore
FileSet
| DsFileSet
object | string array | character vector | cell array of character vectors
Files or folders to include in the datastore, specified as one of these values:
FileSet
object — Specifying the location as aFileSet
object leads to a faster construction time for datastores compared to specifying a path orDsFileSet
object. For more information, see matlab.io.datastore.FileSet.DsFileSet
object — For more information, see matlab.io.datastore.DsFileSet.- File path — You can specify a single file path as a string scalar or character vector. You can specify multiple file paths as a string array or cell array of character vectors.
Files or folders can be local or remote:
- Local files or folders — If the files are not in the current folder, then specify full or relative paths. Files within subfolders of a specified folder are not automatically included in the datastore. You can use the wildcard character (*) when specifying the local path. This character specifies that the datastore include all matching files or all files in the matching folders.
- Remote files or folders — Specify full paths to remote files or folders as a uniform resource locator (URL) of the form
hdfs:///_`pathtofile`_
. Internet URLs must include the protocol type"http://"
or"https://"
. For more information, see Work with Remote Data.
When you specify a folder, the datastore includes only files with supported file formats and ignores files with any other format. To specify a custom list of file extensions to include in your datastore, see the FileExtensions
name-value argument.
The tabularTextDatastore
function supports files with .txt
, .csv
,.dat
, .dlm
,.asc
, and .text
extensions as well as files with no extension.
Example: "file1.csv"
Example: "../dir/data/file1"
Example: ["C:\dir\data\file1.csv","C:\dir\data\file2.dat"]
Example: "C:\dir\data\*.text"
Name-Value Arguments
Specify optional pairs of arguments asName1=Value1,...,NameN=ValueN
, where Name
is the argument name and Value
is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter.
Example: ttds = tabularTextDatastore("C:\dir\textdata",FileExtensions=[".csv",".txt"])
Before R2021a, use commas to separate each name and value, and enclose Name
in quotes.
Example: ttds = tabularTextDatastore("C:\dir\textdata","FileExtensions",[".csv",".txt"])
FileExtensions
— Text file extensions
character vector | cell array of character vectors | string scalar | string array
Text file extensions, specified as the comma-separated pair consisting of "FileExtensions"
and a character vector, cell array of character vectors, string scalar, or string array. The specified extensions do not require a supported format. If you want to include unsupported extensions, then specify all extensions. Use empty quotes ""
to represent files without extensions.
Example: "FileExtensions",".txt"
Example: "FileExtensions",[".text",".csv"]
Data Types: char
| cell
| string
IncludeSubfolders
— Subfolder inclusion flag
true
or false
| 0 or 1
Subfolder inclusion flag, specified as the name-value argument consisting of "IncludeSubfolders"
andtrue
, false
, 0, or 1. Specify true
to include all files and subfolders within each folder or false
to include only the files within each folder.
When you do not specify "IncludeSubfolders"
, then the default value is false
.
Example: "IncludeSubfolders",true
Data Types: logical
| double
OutputType
— Output datatype
"auto"
(default) | "table"
| "timetable"
Output datatype, specified as the name-value argument consisting of"OutputType"
and one of these values:
"auto"
— Detects if the output from the datastore should be a table or a timetable based on whether you specify the"RowTimes"
name-value argument. If you specify"RowTimes"
then the output is a timetable; otherwise, the output is a table."table"
— Return a table."timetable"
— Return a timetable.
The value of OutputType
determines the data type returned by the preview, read, and readall functions. Use this option in conjunction with the"RowTimes"
name-value pair to return timetables fromTabularTextDatastore
.
Example: "OutputType","timetable"
Data Types: char
| string
AlternateFileSystemRoots
— Alternate file system root paths
string vector | cell array
Alternate file system root paths, specified as the name-value argument consisting of"AlternateFileSystemRoots"
and a string vector or a cell array. Use"AlternateFileSystemRoots"
when you create a datastore on a local machine, but need to access and process the data on another machine (possibly of a different operating system). Also, when processing data using the Parallel Computing Toolbox™ and the MATLAB® Parallel Server™, and the data is stored on your local machines with a copy of the data available on different platform cloud or cluster machines, you must use"AlternateFileSystemRoots"
to associate the root paths.
- To associate a set of root paths that are equivalent to one another, specify
"AlternateFileSystemRoots"
as a string vector. For example,
["Z:\datasets","/mynetwork/datasets"] - To associate multiple sets of root paths that are equivalent for the datastore, specify
"AlternateFileSystemRoots"
as a cell array containing multiple rows where each row represents a set of equivalent root paths. Specify each row in the cell array as either a string vector or a cell array of character vectors. For example:- Specify
"AlternateFileSystemRoots"
as a cell array of string vectors.
{["Z:\datasets", "/mynetwork/datasets"];...
["Y:\datasets", "/mynetwork2/datasets","S:\datasets"]} - Alternatively, specify
"AlternateFileSystemRoots"
as a cell array of cell array of character vectors.
{{'Z:\datasets','/mynetwork/datasets'};...
{'Y:\datasets', '/mynetwork2/datasets','S:\datasets'}}
- Specify
The value of "AlternateFileSystemRoots"
must satisfy these conditions:
- Contains one or more rows, where each row specifies a set of equivalent root paths.
- Each row specifies multiple root paths and each root path must contain at least two characters.
- Root paths are unique and are not subfolders of one another.
- Contains at least one root path entry that points to the location of the files.
For more information, see Set Up Datastore for Processing on Different Machines or Clusters.
Example: ["Z:\datasets","/mynetwork/datasets"]
Data Types: string
| cell
TextType
— Output data type of text variables
"char"
(default) | "string"
Output data type of text variables, specified as the comma-separated pair consisting of "TextType"
and either "char"
or "string"
. If the output table from the read
,readall
, or preview
functions contains text variables, then"TextType"
specifies the data type of those variables for TabularTextDatastore
. If"TextType"
is "char"
, then the output is a cell array of character vectors. If"TextType"
is "string"
, then the output has type string
.
Data Types: char
| string
DatetimeType
— Type for imported date and time data
"datetime"
(default) | "text"
Type for imported date and time data, specified as the comma-separated pair consisting of "DatetimeType"
and one of these values: "datetime"
or"text"
.
Value | Type for Imported Date and Time Data |
---|---|
"datetime" | MATLABdatetime data typeFor more information, see datetime. |
"text" | If "DatetimeType" is specified as "text", then the type for imported date and time data depends on the value specified in the"TextType" property: If "TextType" is"char", then thetabularTextdatastore imports dates as a cell array of character vectors.If "TextType" is"string", then thetabularTextdatastore imports dates as an array of strings. |
If the specified TextscanFormats
property contains a %D
, then thetabularTextdatastore
ignores the value specified in DatetimeType
.
Example: "DatetimeType","datetime"
Data Types: char
| string
DurationType
— Output data type of duration data
"duration"
(default) | "text"
Output data type of duration data, specified as the name-value argument consisting of "DurationType"
and either"duration"
or "text"
.
Value | Type for Imported Duration Data |
---|---|
"duration" | MATLABduration data typeFor more information, see duration. |
"text" | If "DurationType" is specified as "text", then the type for imported duration data depends on the value specified in the"TextType" parameter: If "TextType" is"char", then the importing function returns duration data as a cell array of character vectors. If "TextType" is"string", then the importing function returns duration data as an array of strings. |
Data Types: char
| string
DatetimeLocale
— Locale to interpret dates
"en_US"
(default) | character vector | string scalar
Locale to interpret dates, specified as a character vector or string scalar. The DatetimeLocale
value determines how the importing function interprets text that represents dates and times.
When specifying the DatetimeLocale
, use the form_`xx`__ _`YY`_
, where xx
is a lowercase ISO 639-1 two-letter code that specifies a language, and_YY
_ is an uppercase ISO 3166-1 alpha-2 code that specifies a country.
This table lists some common values for the locale.
Locale | Language | Country |
---|---|---|
"de_DE" | German | Germany |
"en_GB" | English | United Kingdom |
"en_US" | English | United States |
"es_ES" | Spanish | Spain |
"fr_FR" | French | France |
"it_IT" | Italian | Italy |
"ja_JP" | Japanese | Japan |
"ko_KR" | Korean | Korea |
"nl_NL" | Dutch | Netherlands |
"zh_CN" | Chinese (simplified) | China |
Note
The Locale
value determines how input values are interpreted. The display format and language is specified by the Locale option in the Datetime format section of the Preferences window. To change the default datetime locale, see Set Command Window Preferences.
Data Types: char
| string
In addition to these name-value pairs, you also can specify the properties on this page as name-value pairs, with the exception of theFiles
property.
Properties
TabularTextDatastore
properties describe the files associated with a TabularTextDatastore
object. Specifically, the properties describe the format of the data in the files and control how the data should be read from the datastore. When you create a TabularTextDatastore
object, the datastore function uses the first file in the Files
property to determine the values of the properties. With the exception of theFiles
property, you can specify the value ofTabularTextDatastore
properties using name-value pair arguments when you create the datastore object. To view or modify a property after creating the object, use the dot notation:
ds = tabularTextDatastore("airlinesmall.csv"); ds.TreatAsMissing = "NA"; ds.MissingValue = 0;
File Properties
Files
— Files included in datastore
cell array of character vectors | string array
Files included in the datastore, resolved as a cell array of character vectors or a string array, where each character vector or string is a full path to a file. The location
argument in thetabularTextDatastore
anddatastore
functions define these files.
The first file specified by the Files
property determines the variable names and format information for all files in the datastore.
When you change the value of this property, the datastore function reevaluates the values of the TabularTextDatastore
properties.
Example: {"C:\dir\data\mydata1.csv";"C:\dir\data\mydata2.csv"}
Data Types: cell
| string
FileEncoding
— File encoding
"UTF-8"
(default) | "US-ASCII"
| "Macintosh"
| ...
File encoding, specified as a character vector or a string scalar like one of these values.
"IBM866" | "ISO-8859-1" | "windows-847" |
---|---|---|
"KOI8-R" | "ISO-8859-2" | "windows-1250" |
"KOI8-U" | "ISO-8859-3" | "windows-1251" |
"Macintosh" | "ISO-8859-4" | "windows-1252" |
"US-ASCII" | "ISO-8859-5" | "windows-1253" |
"UTF-8" | "ISO-8859-6" | "windows-1254" |
"ISO-8859-7" | "windows-1255" | |
"ISO-8859-8" | "windows-1256" | |
"ISO-8859-9" | "windows-1257" | |
"ISO-8859-11" | "windows-1258" | |
"ISO-8859-13" | ||
"ISO-8859-15" |
If each file in the datastore fits into memory, thenFileEncoding
also can be one of these values.
"Big5" | "EUC-KR" | "GB18030" | "Shift_JIS" |
---|---|---|---|
"Big5-HKSCS" | "EUC-JP" | "GB2312" | "windows-949" |
"CP949" | "EUC-TW" | "GBK" |
When you change the value of this property, the datastore function reevaluates the values of the TabularTextDatastore
properties.
Data Types: char
| string
ReadVariableNames
— Read variable names
true
| false
Read variable names, specified as a logical true
orfalse
.
- If unspecified, the
tabularTextDatastore
function detects the presence of variable names automatically. - If
true
, then the first nonheader row of the first file determines the variable names for the data. - If
false
, then the first nonheader row of the first file contains the first row of data. The data is assigned default variable names,Var1
,Var2
, and so on.
When you change the value of this property, the datastore function reevaluates the values of the TabularTextDatastore
properties.
Data Types: logical
VariableNamingRule
— Flag to preserve variable names
"modify"
(default) | "preserve"
Flag to preserve variable names, specified as either "modify"
or"preserve"
.
"modify"
— Convert invalid variable names (as determined by the isvarname function) to valid MATLAB identifiers."preserve"
— Preserve variable names that are not valid MATLAB identifiers such as variable names that include spaces and non-ASCII characters.
Starting in R2019b, variable names and row names can include any characters, including spaces and non-ASCII characters. Also, they can start with any characters, not just letters. Variable and row names do not have to be valid MATLAB identifiers (as determined by the isvarname function). To preserve these variable names and row names, set the value of VariableNamingRule
to "preserve"
. Variable names are not refreshed when the value of VariableNamingRule
is changed from "modify"
to "preserve"
.
Data Types: char
| string
VariableNames
— Names of variables
cell array of character vectors | string array
Names of variables in the datastore, specified as a cell array of character vectors or a string array. Specify the variable names in the order in which they appear in the files. If you do not specify the variable names, they are detected from the first nonheader line in the first file of the datastore. When modifying theVariableNames
property, the number of new variable names must match the number of original variable names.
To support invalid MATLAB identifiers as variable names, such as variable names containing spaces and non-ASCII characters, set the value of theVariableNamingRule
parameter to"preserve"
.
If ReadVariableNames
is false
, then VariableNames
defaults to{"Var1","Var2", ...}
.
Example: {"Time","Name","Quantity"}
Data Types: cell
| string
Text Format Properties
— Number of lines to skip
non-negative integer
Number of lines to skip at the beginning of the file, specified as a nonnegative integer. If unspecified, thetabularTextDatastore
function detects the number of lines to skip automatically.
The tabularTextDatastore
function ignores the specified number of header lines before reading the variable names or data.
When you change the value of this property, the datastore function reevaluates the values of the TabularTextDatastore
properties.
Data Types: double
Delimiter
— Field delimiter characters
character vector | cell array of character vectors | string scalar | string array
Field delimiter characters, specified as a character vector, cell array of character vectors, string scalar, or string array. Specify multiple delimiters in a cell array of character vectors or a string array. If unspecified, the tabularTextDatastore
function detects the delimiter automatically.
Example: "|"
Example: {";","*"}
Repeated delimiter characters in a file are interpreted as separate delimiters with empty fields between them. If unspecified, the read function detects the delimiter automatically by default.
When you specify one of the following escape sequences as a delimiter, it is converted to the corresponding control character.
\b | Backspace |
---|---|
\n | Newline |
\r | Carriage return |
\t | Tab |
\\ | Backslash (\) |
When you change the value of this property, the datastore function reevaluates the values of the TabularTextDatastore
properties.
Data Types: char
| cell
| string
RowDelimiter
— Row delimiter character
\r\n
(default) | character vector | string scalar
Row delimiter character, specified as a character vector or string scalar that must be either a single character or one of"\r"
, "\n"
, or"\r\n"
.
When you change the value of this property, the datastore function reevaluates the values of the TabularTextDatastore
properties.
Example: ":"
Data Types: char
| string
TreatAsMissing
— Text to treat as missing values
""
(default) | character vector | cell array of character vectors | string scalar | string array
Text to treat as missing values, specified as a single character vector, cell array of character vectors, string scalar, or string array. Values specified as TreatAsMissing
are substituted with the value defined in the MissingValue
property. For instance, if MissingValue
is defined to be aNaN
, and the TreatAsMissing
is specified as "NA"
. Then, in the imported data, all occurrences of "NA"
are replaced byNaN
.
This option only applies to numeric fields. Also, this property is equivalent to the TreatAsEmpty
name-value pair argument for the textscan
function.
When you change the value of this property, the datastore function reevaluates the values of the TabularTextDatastore
properties.
Example: "NA"
Example: {"-",""}
Data Types: char
| cell
| string
MissingValue
— Value for missing numeric fields
NaN
(default) | scalar
Value for missing numeric fields in delimited text files, specified as a scalar. This property is equivalent to theEmptyValue
name-value pair argument for thetextscan
function.
Data Types: double
Advanced Text Format Properties
TextscanFormats
— Data field format
cell array of character vectors | string array
Data field format, specified as a cell array of character vectors or a string array, where each character vector or string contains one conversion specifier.
When you specify or modify the TextscanFormats
property, you can use the same conversion specifiers that the textscan function accepts for the formatSpec
argument. Valid values forTextscanFormats
include conversion specifiers that skip fields using an asterisk (*) character and ones that skip literal text. The number of conversion specifiers must match the number of variables in the VariableNames
property.
- If the value of
TextscanFormats
includes conversion specifiers that skip fields using asterisk characters (*), then the value of theSelectedVariableNames
property automatically updates. MATLAB uses the%*q
conversion specifier to skip fields omitted by theSelectedVariableNames
property and treats the field contents as literal character vectors. For fixed-width files, indicate a skipped field using the appropriate conversion specifier along with the field width. For example,%*52c
skips a field that contains 52 characters. - If you do not specify a value for
TextscanFormats
, thendatastore
determines the format of the data fields by scanning text from the first nonheader line in the first file of the datastore. - Starting in R2020b,
datastore
detects prefixed literals as hexadecimal and binary data. Previously,datastore
detected prefixed literals as text data.
Example: {"%s","%s","%f"}
Data Types: cell
| string
ExponentCharacters
— Exponent characters
"eEdD"
(default) | character vector | string scalar
Exponent characters, specified as a character vector or string scalar. The default exponent characters are e
,E
, d
, andD
.
Data Types: char
| string
— Comment character
""
(default) | character vector | string scalar | two-element array
Comment character used to distinguish comments in the file, specified as character vector, string scalar, or two-element string or cell array.
- If you specify a character vector or string scalar, then all following text on the same line are interpreted as a comment. For example,
"CommentStyle","/*"
interprets all text after/*
on the same line as a comment. - If you specify a two-element string vector or a two-element cell array containing character vectors, then all text between the two characters are interpreted as a comment. For example,
"CommentStyle",{"/*", "*/"}
interprets all text between"/*"
and"*/"
as a comment.
When reading from a TabularTextDatastore
, theread
function checks for comments only at the start of each field, not within a field.
When you change the value of this property, thedatastore
function reevaluates the values of theTabularTextDatastore
properties.
Example: "CommentStyle",{"/*", "*/"}
Data Types: char
| cell
| string
Whitespace
— White-space characters
" \b\t"
(default) | character vector | string scalar
White-space characters, specified as a character vector or a string scalar of one or more characters.
When you specify one of the following escape sequences as any white-space character, the datastore
function converts that sequence to the corresponding control character.
\b | Backspace |
---|---|
\n | Newline |
\r | Carriage return |
\t | Tab |
\\ | Backslash (\) |
When you change the value of this property, the datastore function reevaluates the values of the TabularTextDatastore
properties.
Example: " \b\t"
Data Types: char
| string
MultipleDelimitersAsOne
— Multiple delimiter handling
0 (false)
(default) | 1 (true)
Multiple delimiter handling, specified as eithertrue
or false
. Iftrue
, then datastore
treats consecutive delimiters as a single delimiter. Repeated delimiters separated by white-space are also treated as a single delimiter.
When you change the value of this property, the datastore function reevaluates the values of the TabularTextDatastore
properties.
Properties for preview
, read
, readall
Table
SelectedVariableNames
— Variables to read
cell array of character vectors | string array
Variables to read from the file, specified as a cell array of character vectors or a string array, where each character vector or string contains the name of one variable. You can specify the variable names in any order.
To support invalid MATLAB identifiers as variable names, such as variable names containing spaces and non-ASCII characters, set the value of theVariableNamingRule
parameter to"preserve"
.
Example: ["Var3","Var7","Var4"]
Data Types: cell
| string
SelectedFormats
— Formats of selected variables
cell array of character vectors | string array
Formats of the selected variables to read, specified as a cell array of character vectors or a string array, where each character vector or string contains one conversion specifier. The variables to read are indicated by the SelectedVariableNames
property. The number of character vectors or strings inSelectedFormats
must match the number of variables to read.
You can use the same conversion specifiers that thetextscan
function accepts, including specifiers that skip literal text. However, you cannot use a conversion specifier that skips a field. That is, the conversion specifier cannot include an asterisk character (*).
Example: ["%d","%d"]
Data Types: cell
| string
ReadSize
— Amount of data to read
20000 (default) | positive scalar | "file"
Amount of data to read in a call to the read
function, specified as a positive scalar or "file"
.
- If
ReadSize
is a positive integer, then each call toread
reads at mostReadSize
rows. - If
ReadSize
is"file"
, then each call toread
reads all of the data in one file.
When you change ReadSize
from a numeric scalar to "file"
or vice versa, MATLAB resets the datastore to the state where no data has been read from it.
Data Types: double
| char
| string
TextType
— Output data type of text variables
"char"
(default) | "string"
Output data type of text variables, specified as"char"
or "string"
.TextType
specifies the data type of text variables formatted with %s
, %q
, or [...]
.
- If
TextType
is"char"
, then the output is a cell array of character vectors. - If
TextType
is"string"
, then the output has typestring
.
Data Types: char
| string
RowTimes
— Name of row times variable
variable name | variable index
Name of row times variable, specified as the name-value argument consisting of"RowTimes"
and a variable name (such as"Date"
) or a variable index (such as 3
).
RowTimes
is a timetable-related parameter. Each row of a timetable is associated with a time, which is captured in a time vector for the timetable. The variable specified in RowTimes
must contain adatetime
or a duration
vector.
If the value of "OutputType"
is "timetable"
, but you do not specify "RowTimes"
, then TabularTextDatastore
uses the first datetime
or duration
variable as the row times for the timetable.
Properties for use by writeall
Folders
— Folders used to construct datastore
cell array of character vectors
This property is read-only.
Folders used to construct datastore, returned as a cell array of character vectors. The cell array is oriented as a column vector. Each character vector is a path to a folder that contains data files. Thelocation
argument in thetabularTextDatastore
anddatastore
functions definesFolders
when the datastore is created.
The Folders
property is reset when you modify theFiles
property of aTabularTextDatastore
object.
Data Types: cell
SupportedOutputFormats
— Formats supported for writing
string row vector
This property is read-only.
Formats supported for writing, returned as a row vector of strings. This property specifies the possible output formats when using writeall to write output files from the datastore.
DefaultOutputFormat
— Default output format
string scalar
This property is read-only.
Default output format, returned as a string scalar. This property specifies the default format when using writeall to write output files from the datastore.
Data Types: string
Object Functions
Examples
Select Variables to Read
Using either a FileSet object or a file path, create a datastore from the sample file airlinesmall_subset.csv
. Specify how to treat missing values. Select only variables of interest.
Create a FileSet object from the file airlinesmall_subset.csv
. Create a datastore where values of "NA"
are treated as missing and missing values are treated as 0
.
fs = matlab.io.datastore.FileSet("airlinesmall_subset.csv"); ds = tabularTextDatastore(fs,"TreatAsMissing","NA",... "MissingValue",0);
View the variables in the datastore.
ans = 1x19 cell {'Year'} {'Month'} {'DayofMonth'} {'DayOfWeek'} {'DepTime'} {'CRSDepTime'} {'ArrTime'} {'CRSArrTime'} {'UniqueCarrier'} {'FlightNum'} {'TailNum'} {'ActualElapsedTime'} {'CRSElapsedTime'} {'AirTime'} {'ArrDelay'} {'DepDelay'} {'Origin'} {'Dest'} {'Distance'}
Modify the SelectedVariableNames
property to specify the variables of interest.
ds.SelectedVariableNames = ["Year","Month","TailNum"];
Alternatively, you can specify the location of your data as a file path.
ds = tabularTextDatastore("airlinesmall_subset.csv");
Using either location format, you can also specify your variables of interest when you create the datastore.
ds = tabularTextDatastore("airlinesmall_subset.csv","TreatAsMissing","NA",... "MissingValue",0,"SelectedVariableNames",["Year","Month","TailNum"]);
Specify Format of Data to Read
Create a datastore from the sample file airlinesmall_subset.csv
, which contains tabular data.
ds = tabularTextDatastore("airlinesmall_subset.csv","TreatAsMissing","NA",... "MissingValue",0);
Specify the variables of interest.
ds.SelectedVariableNames = ["Year","Month","UniqueCarrier"];
View the SelectedFormats
property.
ans = 1x3 cell {'%f'} {'%f'} {'%q'}
The SelectedFormats
property specifies how the tabularTextDatastore
function interprets the format of the variables. The Year
and Month
variables are read as columns of floating-point values and the UniqueCarrier
variable as a column of text.
Modify the SelectedFormats
property to read the first two variables as signed integers and the third variable as a categorical value.
ds.SelectedFormats = ["%d","%d","%C"];
Preview the data.
T=8×3 table Year Month UniqueCarrier ____ _____ _____________
1996 1 HP
1996 1 HP
1996 1 HP
1996 1 HP
1996 1 US
1996 1 US
1996 1 US
1996 1 NW
Return Timetable from Tabular Text Datastore
Use the OutputType
and RowTimes
name-value pairs to make tabulartextDatastore
return timetables instead of tables.
Create a datastore for outages.csv
. Specify the "OutputType
" name-value pair as "timetable
".
ttds = tabularTextDatastore("outages.csv","OutputType","timetable"); preview(ttds)
ans=8×5 timetable
OutageTime Region Loss Customers RestorationTime Cause
________________ _____________ ______ __________ ________________ ___________________
2002-02-01 12:18 {'SouthWest'} 458.98 1.8202e+06 2002-02-07 16:50 {'winter storm' }
2003-01-23 00:49 {'SouthEast'} 530.14 2.1204e+05 NaT {'winter storm' }
2003-02-07 21:15 {'SouthEast'} 289.4 1.4294e+05 2003-02-17 08:14 {'winter storm' }
2004-04-06 05:44 {'West' } 434.81 3.4037e+05 2004-04-06 06:10 {'equipment fault'}
2002-03-16 06:18 {'MidWest' } 186.44 2.1275e+05 2002-03-18 23:23 {'severe storm' }
2003-06-18 02:49 {'West' } 0 0 2003-06-18 10:54 {'attack' }
2004-06-20 14:39 {'West' } 231.29 NaN 2004-06-20 19:16 {'equipment fault'}
2002-06-06 19:28 {'West' } 311.86 NaN 2002-06-07 00:51 {'equipment fault'}
When you do not also specify 'RowTimes'
, tabularTextDatastore
uses the first datetime or duration variable as the row times. In this case, the OutageTime
variable is used for the row times.
Specify the 'RowTimes'
option to use the restoration times (RestorationTime
variable) as the row times, instead of the time of the power outages.
ttds = tabularTextDatastore("outages.csv","OutputType","timetable","RowTimes","RestorationTime"); preview(ttds)
ans=8×5 timetable
RestorationTime Region OutageTime Loss Customers Cause
________________ _____________ ________________ ______ __________ ___________________
2002-02-07 16:50 {'SouthWest'} 2002-02-01 12:18 458.98 1.8202e+06 {'winter storm' }
NaT {'SouthEast'} 2003-01-23 00:49 530.14 2.1204e+05 {'winter storm' }
2003-02-17 08:14 {'SouthEast'} 2003-02-07 21:15 289.4 1.4294e+05 {'winter storm' }
2004-04-06 06:10 {'West' } 2004-04-06 05:44 434.81 3.4037e+05 {'equipment fault'}
2002-03-18 23:23 {'MidWest' } 2002-03-16 06:18 186.44 2.1275e+05 {'severe storm' }
2003-06-18 10:54 {'West' } 2003-06-18 02:49 0 0 {'attack' }
2004-06-20 19:16 {'West' } 2004-06-20 14:39 231.29 NaN {'equipment fault'}
2002-06-07 00:51 {'West' } 2002-06-06 19:28 311.86 NaN {'equipment fault'}
Limitations
- Datetime data containing day, month, or time zone names in a language foreign to the
en_US
locale are not supported. For unrecognized datetime formats, specify the format using theTextscanFormats
parameter. - MATLAB does not support internet URLs that require authentication.
- MATLAB Online™ supports internet URLs associated with Microsoft® OneDrive™ files and folders, while the installed version of MATLAB supports only local OneDrive files.
Version History
Introduced in R2014b
R2024b: Read data over HTTP and HTTPS using datastore functions
You can read data from primary online sources by performing datastore operations over an internet URL.