pd.concat loses frequency attribute for 'continuous' DataFrame appends · Issue #3232 · pandas-dev/pandas (original) (raw)

Hey all,

I have a DataFrame (df) that stores live sensor data that is captured at a specific frequency. New raw data from sensor is updated at a set interval (an attempt at bandwidth conservation), which is parsed into a new df object.

These new update dataframes are of the same frequency, and contain data that is 'continuous' in time (i.e., they pick up right where the last timestamp left off), and ultimately I would like to append this new data to the existing dataframe while preserving the main dataframe frequency attribute. I tried by using a concat of old and new dataframes, however, it seems that concat doesn't check this case for continuous time series, and loses its frequency attribute. This can be reproduced in code below:

import pandas as pd import numpy as np dr = pd.date_range('01-Jan-2013', periods=100, freq='50L', tz='UTC') df = pd.DataFrame(np.random.randn(100, 2), index=dr) df.index

<class 'pandas.tseries.index.DatetimeIndex'> [2013-01-01 00:00:00, ..., 2013-01-01 00:00:04.950000] Length: 100, Freq: 50L, Timezone: UTC

These guys look good:

#Preserves frequency print df[:50].index print df[50:].index

<class 'pandas.tseries.index.DatetimeIndex'> [2013-01-01 00:00:00, ..., 2013-01-01 00:00:02.450000] Length: 50, Freq: 50L, Timezone: UTC <class 'pandas.tseries.index.DatetimeIndex'> [2013-01-01 00:00:02.500000, ..., 2013-01-01 00:00:04.950000] Length: 50, Freq: 50L, Timezone: UTC

However, these guys, together, forget where they came from:

#Loses frequency pd.concat([df[:50], df[50:]]).index

<class 'pandas.tseries.index.DatetimeIndex'> [2013-01-01 00:00:00, ..., 2013-01-01 00:00:04.950000] Length: 100, Freq: None, Timezone: UTC

I currently get around this with a resample of the resulting df to set frequency, which isn't that big of a deal, however, thought I'd mention it so that a more elegant behavior could be implemented. I'll try and take a look when I have time, but I know that all you here are so much more familiar with pandas internals. Any pointers?
And, as always, thank you! :)