rowfilter - Selectively import rows of interest - MATLAB (original) (raw)
Selectively import rows of interest
Since R2022a
Description
A RowFilter
object enables you to specify how MATLAB® imports tabular data. The object contains properties that control the data import process by specifying conditions that must be satisfied for rows to be included. You can create a RowFilter
object by using the rowfilter
function.
Creation
Syntax
Description
rf = rowfilter([varnames](#mw%5Ff1cb3847-b279-48a2-95b7-4485a9372866))
filters rows within your data set using the variables specified in varnames
. You can use the returned RowFilter
object rf
as an input toparquetread
and parquetDatastore
to filter data before importing, or as a row subscript when subscripting into a table or timetable.
Each variable name within the input argument becomes an individual property of theRowFilter
object. Use tab-completion for efficient access to available variable names. Use relational operators to express conditions on these variables to target your rows of interest. These relational operators are supported:<
, <=
, >
,>=
, ==
, and ~=
. To express multiple filtering constraints, combine RowFilter
objects using the&
, |
, or ~
operators. The~
operator can also be used with individual constraints.
rf = rowfilter([info](#mw%5F74141c6f-ad73-4918-b3f7-2eec227186b2))
uses the variable names from the VariableNames
property of a ParquetInfo
object.
rf = rowfilter([pds](#mw%5F03602782-b118-41fb-8c20-04cf59351b21))
uses the variable names from the VariableNames
property of aParquetDatastore
object.
rf = rowfilter([T](#mw%5F519d8028-5bdb-4f3d-aed2-7537aeeb8c81))
uses the variable names from the input table or timetable. (since R2023a)
Input Arguments
Names of variables used to filter rows, specified as a string scalar, character vector, string array, or cell array of character vectors.
Example: ["OutageTime" "Region" "Cause"]
ParquetInfo
object used to filter rows, specified as aParquetInfo
object. The variables identified by theVariableNames
property of the ParquetInfo
object are used to filter rows.
ParquetDatastore
object used to filter rows, specified as aParquetDatastore
object. The variables identified by theVariableNames
property of the ParquetDatastore
object are used to filter rows.
Since R2023a
Input table or timetable, specified as a table or timetable. IfT
is a timetable, you can filter on row times as well as variables.
Examples
Import a subset of data by specifying variables and rows to import by using a row filter.
To import a subset of the outages.parquet
file, create a filter to import only the OutageTime
, Region
, and Cause
variables. Then, refine the filter to import only rows with values that meet certain conditions.
rf = rowfilter(["OutageTime" "Region" "Cause"]); rf2 = (rf.OutageTime > datetime("2013-02-01")) & (rf.Region == "NorthEast") & (rf.Cause == "winter storm"); d = parquetread("outages.parquet",RowFilter=rf2,SelectedVariableNames=["OutageTime" "Region" "Cause"])
d=6×3 table
OutageTime Region Cause
____________________ ___________ ______________
09-Feb-2013 00:55:00 "NorthEast" "winter storm"
13-Feb-2013 01:44:00 "NorthEast" "winter storm"
25-Dec-2013 11:24:00 "NorthEast" "winter storm"
30-Dec-2013 11:40:00 "NorthEast" "winter storm"
22-Feb-2013 02:17:00 "NorthEast" "winter storm"
23-Feb-2013 01:53:00 "NorthEast" "winter storm"
The resulting subset of filtered data contains only the 6 rows that meet the filter conditions and the 3 specified variables.
Import the rows of interest from a data set using a row filter and ParquetInfo
object.
Create a ParquetInfo
object from the outages.parquet
file. Create a row filter using the ParquetInfo
object. Then, use the row filter to import rows with Loss
values less than 100.
info = parquetinfo("outages.parquet"); rf = rowfilter(info); data = parquetread("outages.parquet",RowFilter=rf.Loss<100)
data=321×6 table
Region OutageTime Loss Customers RestorationTime Cause
___________ ____________________ ______ __________ ____________________ __________________
"West" 18-Jun-2003 02:49:00 0 0 18-Jun-2003 10:54:00 "attack"
"SouthEast" 05-Sep-2004 17:48:00 73.387 36073 05-Sep-2004 20:46:00 "equipment fault"
"SouthEast" 01-Sep-2002 18:22:00 95.917 36759 01-Sep-2002 19:12:00 "severe storm"
"NorthEast" 18-Sep-2004 05:54:00 0 0 NaT "equipment fault"
"SouthEast" 12-Dec-2002 18:08:00 46.918 1.0698e+05 14-Dec-2002 18:43:00 "winter storm"
"West" 16-Dec-2002 13:43:00 70.752 4.8193e+05 19-Dec-2002 09:38:00 "winter storm"
"SouthEast" 24-Feb-2003 06:13:00 0 0 24-Feb-2003 21🔞00 "attack"
"SouthEast" 07-May-2005 00:07:00 65.95 47140 07-May-2005 09:26:00 "thunder storm"
"MidWest" 25-Jun-2003 14:03:00 77.432 1.2328e+05 28-Jun-2003 07:02:00 "thunder storm"
"SouthEast" 14-Jul-2002 21:32:00 90.83 60133 14-Jul-2002 23:53:00 "thunder storm"
"MidWest" 05-Mar-2002 17:53:00 96.563 2.8666e+05 10-Mar-2002 14:41:00 "wind"
"MidWest" 26-Sep-2004 01:14:00 58.656 1.2288e+05 01-Oct-2004 01:36:00 "wind"
"SouthEast" 21-Jan-2006 21:27:00 19.513 10994 21-Jan-2006 23:25:00 "equipment fault"
"MidWest" 19-Jan-2004 12:37:00 16.03 2.2296 20-Jan-2004 02:50:00 "equipment fault"
"NorthEast" 14-Jan-2004 20:01:00 87.08 14979 16-Jan-2004 00:52:00 "energy emergency"
"NorthEast" 04-Feb-2005 00:53:00 32.061 46182 09-Feb-2005 02:42:00 "winter storm"
⋮
The filtered data subset contains 321 rows and 6 variables.
Import the rows of interest from a data set using a row filter and ParquetDatastore
object.
Create a ParquetDatastore
object from the outages.parquet
file. Create a row filter using the ParquetDatastore
object. Then, use the row filter to select rows with Customer
values greater than 200,000 and Region
values of "MidWest"
.
pds = parquetDatastore("outages.parquet"); rf = rowfilter(pds); pds.RowFilter = rf.Customers > 2e6 & rf.Region == "MidWest"; data = readall(pds)
data=3×6 table
Region OutageTime Loss Customers RestorationTime Cause
_________ ____________________ ______ __________ ____________________ ______________
"MidWest" 10-Dec-2002 10:45:00 14493 3.0879e+06 11-Dec-2002 18:06:00 "unknown"
"MidWest" 16-Jul-2006 00:05:00 1817.9 3.295e+06 27-Jul-2006 14:42:00 "severe storm"
"MidWest" 07-Sep-2008 23:35:00 NaN 3.972e+06 19-Sep-2008 17:19:00 "severe storm"
The filtered data subset contains 3 rows and 6 variables.
Import data from a comma-separated value (CSV) file into a table.
T = readtable("outages.csv",TextType="string");
Create a row filter from the table.
Subscript on rows where Region
equals "West"
and Loss
is greater than 5000.
T(rf.Region == "West" & rf.Loss > 5000,:)
ans=4×6 table
Region OutageTime Loss Customers RestorationTime Cause
______ ________________ ______ __________ ________________ __________________
"West" 2012-07-16 08:37 7516.8 2.145e+05 2012-07-16 15:10 "equipment fault"
"West" 2010-05-17 09:10 8496.6 2.0768e+06 2010-05-18 22:43 "equipment fault"
"West" 2011-05-21 10:42 16659 7919.1 2011-05-22 06:18 "equipment fault"
"West" 2012-07-11 19:34 5481.7 NaN 2012-07-11 23:58 "energy emergency"
Version History
Introduced in R2022a
You can create a RowFilter
object by callingrowfilter
with a table or timetable input argument. The resulting row filter uses the variable names from the input table or timetable to filter rows of a data set.