Clean Timetable with Missing, Duplicate, or Nonuniform Times - MATLAB & Simulink (original) (raw)

This example shows how to create a regular timetable from one that has missing, duplicate, or nonuniform times. A timetable is a type of table that associates a time-stamp, or row time, with each row of data. In a regular timetable, the row times are sorted and unique, and differ by the same regular time step.

Also, some toolboxes have functions that work on regularly spaced time series data in the form of numeric arrays. So the example also shows how to export the data from a timetable for use with other functions.

There are a number of issues with row times that can make timetables irregular. The row times can be missing. They can be out of order. They can be duplicates, creating multiple rows with the same time that might have the same or different data. And even when they are present, sorted, and unique, they can differ by time steps of different sizes.

Timetables provide a number of different ways to resolve missing, duplicate, or nonuniform times, and to resample or aggregate data to create regular row times.

Workflow for cleaning timetables with MATLAB functions.

Load Timetable

Load a sample timetable from the MAT-file badTimes that contains weather measurements taken over several hours on June 9, 2016. The timetable TT includes temperature, rainfall, and wind speed measurements taken at irregular times during that day.

TT=12×3 timetable Time Temp Rain WindSpeed ____________________ ____ ____ _________

09-Jun-2016 06:01:04      73    0.01       2.3   
09-Jun-2016 07:59:23      59    0.08       0.9   
09-Jun-2016 09:53:57      59    0.03       3.4   
09-Jun-2016 09:53:57      67    0.03       3.4   
NaT                       56       0         0   
09-Jun-2016 09:53:57      67    0.03       3.4   
09-Jun-2016 08:49:10      62    0.01       2.7   
09-Jun-2016 08:49:10    75.8    0.01       2.7   
09-Jun-2016 08:49:10      82    0.01       2.7   
09-Jun-2016 05:03:11    66.2    0.05         3   
09-Jun-2016 08:49:10    67.2    0.01       2.7   
09-Jun-2016 04:12:00    58.8     NaN       NaN   

Find and Remove Rows with Missing Row Times

One way to begin is by finding and removing rows that have a NaT, or missing value, as the row time. To find missing values in the vector of row times, use ismissing. The ismissing function returns a logical vector that contains 1 wherever TT.Time has a missing value.

natRowTimes = ismissing(TT.Time)

natRowTimes = 12×1 logical array

0 0 0 0 1 0 0 0 0 0 0 0 ⋮

To keep only those rows that do not have missing values as row times, index into TT using ~natRowTimes as row indices. Assign those rows to a new timetable, goodRowTimesTT.

goodRowTimesTT = TT(~natRowTimes,:)

goodRowTimesTT=11×3 timetable Time Temp Rain WindSpeed ____________________ ____ ____ _________

09-Jun-2016 06:01:04      73    0.01       2.3   
09-Jun-2016 07:59:23      59    0.08       0.9   
09-Jun-2016 09:53:57      59    0.03       3.4   
09-Jun-2016 09:53:57      67    0.03       3.4   
09-Jun-2016 09:53:57      67    0.03       3.4   
09-Jun-2016 08:49:10      62    0.01       2.7   
09-Jun-2016 08:49:10    75.8    0.01       2.7   
09-Jun-2016 08:49:10      82    0.01       2.7   
09-Jun-2016 05:03:11    66.2    0.05         3   
09-Jun-2016 08:49:10    67.2    0.01       2.7   
09-Jun-2016 04:12:00    58.8     NaN       NaN   

This method removes only the rows that have missing row times. The timetable variables still might have missing data values. For example, the last row of goodRowTimesTT has NaN values for the Rain and Windspeed variables.

Remove Rows with Missing Times and Missing Data

As an alternative, you can remove both missing row times and missing data values at the same time by using the rmmissing function. rmmissing removes any timetable rows that have missing row times, missing data values, or both.

Display the missing row time and missing data values of TT.

TT=12×3 timetable Time Temp Rain WindSpeed ____________________ ____ ____ _________

09-Jun-2016 06:01:04      73    0.01       2.3   
09-Jun-2016 07:59:23      59    0.08       0.9   
09-Jun-2016 09:53:57      59    0.03       3.4   
09-Jun-2016 09:53:57      67    0.03       3.4   
NaT                       56       0         0   
09-Jun-2016 09:53:57      67    0.03       3.4   
09-Jun-2016 08:49:10      62    0.01       2.7   
09-Jun-2016 08:49:10    75.8    0.01       2.7   
09-Jun-2016 08:49:10      82    0.01       2.7   
09-Jun-2016 05:03:11    66.2    0.05         3   
09-Jun-2016 08:49:10    67.2    0.01       2.7   
09-Jun-2016 04:12:00    58.8     NaN       NaN   

Remove all rows that have missing row times or data values. Assign the remaining rows to the timetable goodValuesTT.

goodValuesTT = rmmissing(TT)

goodValuesTT=10×3 timetable Time Temp Rain WindSpeed ____________________ ____ ____ _________

09-Jun-2016 06:01:04      73    0.01       2.3   
09-Jun-2016 07:59:23      59    0.08       0.9   
09-Jun-2016 09:53:57      59    0.03       3.4   
09-Jun-2016 09:53:57      67    0.03       3.4   
09-Jun-2016 09:53:57      67    0.03       3.4   
09-Jun-2016 08:49:10      62    0.01       2.7   
09-Jun-2016 08:49:10    75.8    0.01       2.7   
09-Jun-2016 08:49:10      82    0.01       2.7   
09-Jun-2016 05:03:11    66.2    0.05         3   
09-Jun-2016 08:49:10    67.2    0.01       2.7   

Sort Timetable and Determine If It Is Regular

After dealing with missing values, you can go on to sort your timetable and then determine if the sorted timetable is regular.

To determine if goodValuesTT is already sorted, use the issorted function.

tf = issorted(goodValuesTT)

Since it is not, sort the timetable on its row times by using the sortrows function.

sortedTT = sortrows(goodValuesTT)

sortedTT=10×3 timetable Time Temp Rain WindSpeed ____________________ ____ ____ _________

09-Jun-2016 05:03:11    66.2    0.05         3   
09-Jun-2016 06:01:04      73    0.01       2.3   
09-Jun-2016 07:59:23      59    0.08       0.9   
09-Jun-2016 08:49:10      62    0.01       2.7   
09-Jun-2016 08:49:10    75.8    0.01       2.7   
09-Jun-2016 08:49:10      82    0.01       2.7   
09-Jun-2016 08:49:10    67.2    0.01       2.7   
09-Jun-2016 09:53:57      59    0.03       3.4   
09-Jun-2016 09:53:57      67    0.03       3.4   
09-Jun-2016 09:53:57      67    0.03       3.4   

Determine whether sortedTT is regular. A regular timetable has the same time interval between consecutive row times. Even a sorted timetable can have time steps that are not uniform.

Since it is not, display the differences between row times.

ans = 9×1 duration 00:57:53 01:58:19 00:49:47 00:00:00 00:00:00 00:00:00 01:04:47 00:00:00 00:00:00

Since the row times are sorted, this result shows that some row times are unique and some are duplicates.

Remove Duplicate Rows

Timetables can have duplicate rows. Timetable rows are duplicates if they have the same row times and the same data values. In this example, the last two rows of sortedTT are duplicate rows. (There are other rows in sortedTT that have duplicate row times but differing data values.)

To remove the duplicate rows from sortedTT, use unique. The unique function returns the unique rows and sorts them by their row times.

uniqueRowsTT = unique(sortedTT)

uniqueRowsTT=9×3 timetable Time Temp Rain WindSpeed ____________________ ____ ____ _________

09-Jun-2016 05:03:11    66.2    0.05         3   
09-Jun-2016 06:01:04      73    0.01       2.3   
09-Jun-2016 07:59:23      59    0.08       0.9   
09-Jun-2016 08:49:10      62    0.01       2.7   
09-Jun-2016 08:49:10    67.2    0.01       2.7   
09-Jun-2016 08:49:10    75.8    0.01       2.7   
09-Jun-2016 08:49:10      82    0.01       2.7   
09-Jun-2016 09:53:57      59    0.03       3.4   
09-Jun-2016 09:53:57      67    0.03       3.4   

Find Rows with Duplicate Times and Different Data

Timetables can have rows with duplicate row times but different data values. In this example, uniqueRowsTT has several rows with the same row times but different values.

Find the rows that have duplicate row times. First, sort the row times and find consecutive times that have no difference between them. Times with no difference between them are the duplicates. Index back into the vector of row times and return a unique set of times that identify the duplicate row times in uniqueRowsTT.

dupTimes = sort(uniqueRowsTT.Time); tf = (diff(dupTimes) == 0); dupTimes = dupTimes(tf); dupTimes = unique(dupTimes)

dupTimes = 2×1 datetime 09-Jun-2016 08:49:10 09-Jun-2016 09:53:57

To display the rows with duplicate row times, index into uniqueRowsTT using dupTimes. When you index on times, the output timetable contains all rows with matching row times.

ans=6×3 timetable Time Temp Rain WindSpeed ____________________ ____ ____ _________

09-Jun-2016 08:49:10      62    0.01       2.7   
09-Jun-2016 08:49:10    67.2    0.01       2.7   
09-Jun-2016 08:49:10    75.8    0.01       2.7   
09-Jun-2016 08:49:10      82    0.01       2.7   
09-Jun-2016 09:53:57      59    0.03       3.4   
09-Jun-2016 09:53:57      67    0.03       3.4   

Select First and Last Rows with Duplicate Times

When a timetable has rows with duplicate times, you might want to select particular rows and discard the other rows having duplicate times. For example, you can select either the first or the last of the rows with duplicate row times by using the unique and retime functions.

First, create a vector of unique row times from TT by using unique.

uniqueTimes = unique(uniqueRowsTT.Time)

uniqueTimes = 5×1 datetime 09-Jun-2016 05:03:11 09-Jun-2016 06:01:04 09-Jun-2016 07:59:23 09-Jun-2016 08:49:10 09-Jun-2016 09:53:57

Select the first row from each set of rows that have duplicate times. To copy data from the first rows, specify the 'firstvalue' method.

firstUniqueRowsTT = retime(uniqueRowsTT,uniqueTimes,'firstvalue')

firstUniqueRowsTT=5×3 timetable Time Temp Rain WindSpeed ____________________ ____ ____ _________

09-Jun-2016 05:03:11    66.2    0.05         3   
09-Jun-2016 06:01:04      73    0.01       2.3   
09-Jun-2016 07:59:23      59    0.08       0.9   
09-Jun-2016 08:49:10      62    0.01       2.7   
09-Jun-2016 09:53:57      59    0.03       3.4   

Select the last rows from each set of rows that have duplicate times. To copy data from the last rows, specify the 'lastvalue' method.

lastUniqueRowsTT = retime(uniqueRowsTT,uniqueTimes,'lastvalue')

lastUniqueRowsTT=5×3 timetable Time Temp Rain WindSpeed ____________________ ____ ____ _________

09-Jun-2016 05:03:11    66.2    0.05         3   
09-Jun-2016 06:01:04      73    0.01       2.3   
09-Jun-2016 07:59:23      59    0.08       0.9   
09-Jun-2016 08:49:10      82    0.01       2.7   
09-Jun-2016 09:53:57      67    0.03       3.4   

As a result, the last two rows of firstUniqueRowsTT and lastUniqueRowsTT have different values in the Temp variable.

Aggregate Data from All Rows with Duplicate Times

Another way to deal with data in the rows having duplicate times is to aggregate or combine the data values in some way. For example, you can calculate the means of several measurements of the same quantity taken at the same time.

Calculate the mean temperature, rainfall, and wind speed for rows with duplicate row times using the retime function.

meanTT = retime(uniqueRowsTT,uniqueTimes,'mean')

meanTT=5×3 timetable Time Temp Rain WindSpeed ____________________ _____ ____ _________

09-Jun-2016 05:03:11     66.2    0.05         3   
09-Jun-2016 06:01:04       73    0.01       2.3   
09-Jun-2016 07:59:23       59    0.08       0.9   
09-Jun-2016 08:49:10    71.75    0.01       2.7   
09-Jun-2016 09:53:57       63    0.03       3.4   

As a result, the last two rows of meanTT have mean temperatures in the Temp variable for the rows with duplicate row times.

Make Timetable Regular

Finally, you can resample data from an irregular timetable to make it regular by using the retime function. For example, you can interpolate the data from meanTT onto a regular hourly time vector. To use linear interpolation, specify 'linear'. Each row time in hourlyTT begins on the hour, and there is a one-hour interval between consecutive row times.

hourlyTT = retime(meanTT,'hourly','linear')

hourlyTT=6×3 timetable Time Temp Rain WindSpeed ____________________ ______ ________ _________

09-Jun-2016 05:00:00    65.826      0.0522     3.0385  
09-Jun-2016 06:00:00    72.875    0.010737     2.3129  
09-Jun-2016 07:00:00    66.027    0.044867     1.6027  
09-Jun-2016 08:00:00    59.158    0.079133     0.9223  
09-Jun-2016 09:00:00    70.287    0.013344     2.8171  
09-Jun-2016 10:00:00    62.183    0.031868     3.4654  

Instead of using a predefined time step such as 'hourly', you can specify a time step of your own. To specify a time step of 30 minutes, use the 'regular' input argument and the 'TimeStep' name-value argument. You can specify a time step of any size as a duration or calendarDuration value.

regularTT = retime(meanTT,'regular','linear','TimeStep',minutes(30))

regularTT=11×3 timetable Time Temp Rain WindSpeed ____________________ ______ ________ _________

09-Jun-2016 05:00:00    65.826      0.0522     3.0385  
09-Jun-2016 05:30:00     69.35    0.031468     2.6757  
09-Jun-2016 06:00:00    72.875    0.010737     2.3129  
09-Jun-2016 06:30:00    69.576    0.027118     1.9576  
09-Jun-2016 07:00:00    66.027    0.044867     1.6027  
09-Jun-2016 07:30:00    62.477    0.062616     1.2477  
09-Jun-2016 08:00:00    59.158    0.079133     0.9223  
09-Jun-2016 08:30:00    66.841     0.03695      2.007  
09-Jun-2016 09:00:00    70.287    0.013344     2.8171  
09-Jun-2016 09:30:00    66.235    0.022606     3.1412  
09-Jun-2016 10:00:00    62.183    0.031868     3.4654  

Extract Regular Timetable Data into Array

You can export the timetable data for use with functions to analyze data that is regularly spaced in time. For example, the Econometrics Toolbox™ and the Signal Processing Toolbox™ have functions you can use for further analysis on regularly spaced data.

Extract the timetable data as an array. You can use the Variables property to return the data as an array, as long as the table variables have data types that allow them to be concatenated.

A = 11×3

65.8260 0.0522 3.0385 69.3504 0.0315 2.6757 72.8747 0.0107 2.3129 69.5764 0.0271 1.9576 66.0266 0.0449 1.6027 62.4768 0.0626 1.2477 59.1579 0.0791 0.9223 66.8412 0.0370 2.0070 70.2868 0.0133 2.8171 66.2348 0.0226 3.1412 62.1829 0.0319 3.4654 ⋮

regularTT.Variables is equivalent to using curly brace syntax, regularTT{:,:}, to access the data in the timetable variables.

A2 = 11×3

65.8260 0.0522 3.0385 69.3504 0.0315 2.6757 72.8747 0.0107 2.3129 69.5764 0.0271 1.9576 66.0266 0.0449 1.6027 62.4768 0.0626 1.2477 59.1579 0.0791 0.9223 66.8412 0.0370 2.0070 70.2868 0.0133 2.8171 66.2348 0.0226 3.1412 62.1829 0.0319 3.4654 ⋮

See Also

timetable | table2timetable | retime | issorted | sortrows | unique | diff | isregular | rmmissing | fillmissing

Topics