duplicated time series index entries after hdf5 store (dayligth saving time related) · Issue #1081 · pandas-dev/pandas (original) (raw)
>> df
<class 'pandas.core.frame.DataFrame'>
Index: 596520 entries, 2006-04-14 00:00:00 to 2011-12-31 23:55:00
Data columns:
g_m_pyr__0 596520 non-null values
e_wr__0 596520 non-null values
p_nenn_sg__0 596520 non-null values
flaeche_sg__0 596520 non-null values
dtypes: float64(4)
>> df.index[0].timetuple()
time.struct_time(tm_year=2006, tm_mon=4, tm_mday=14, tm_hour=0, tm_min=0, tm_sec=0, tm_wday=4, tm_yday=104, tm_isdst=-1)
>> df.index.get_duplicates()
[]
saving this dataframe to hdf5 and reloading it results in duplicated index entries:
>> store = pd.io.pytables.HDFStore('store.h5')
>> store['df'] = df
>> df2 = store['df']
>> df2.index.get_duplicates()
[datetime.datetime(2007, 3, 25, 3, 0),
datetime.datetime(2007, 3, 25, 3, 5),
datetime.datetime(2007, 3, 25, 3, 10),
datetime.datetime(2007, 3, 25, 3, 15),
datetime.datetime(2007, 3, 25, 3, 20),
datetime.datetime(2007, 3, 25, 3, 25),
datetime.datetime(2007, 3, 25, 3, 30),
datetime.datetime(2007, 3, 25, 3, 35),
datetime.datetime(2007, 3, 25, 3, 40),
datetime.datetime(2007, 3, 25, 3, 45),
datetime.datetime(2007, 3, 25, 3, 50),
datetime.datetime(2007, 3, 25, 3, 55),
datetime.datetime(2008, 3, 30, 3, 0),
datetime.datetime(2008, 3, 30, 3, 5),
datetime.datetime(2008, 3, 30, 3, 10),
datetime.datetime(2008, 3, 30, 3, 15),
datetime.datetime(2008, 3, 30, 3, 20),
datetime.datetime(2008, 3, 30, 3, 25),
datetime.datetime(2008, 3, 30, 3, 30),
datetime.datetime(2008, 3, 30, 3, 35),
datetime.datetime(2008, 3, 30, 3, 40),
datetime.datetime(2008, 3, 30, 3, 45),
datetime.datetime(2008, 3, 30, 3, 50),
datetime.datetime(2008, 3, 30, 3, 55),
datetime.datetime(2009, 3, 29, 3, 0),
datetime.datetime(2009, 3, 29, 3, 5),
datetime.datetime(2009, 3, 29, 3, 10),
datetime.datetime(2009, 3, 29, 3, 15),
datetime.datetime(2009, 3, 29, 3, 20),
datetime.datetime(2009, 3, 29, 3, 25),
datetime.datetime(2009, 3, 29, 3, 30),
datetime.datetime(2009, 3, 29, 3, 35),
datetime.datetime(2009, 3, 29, 3, 40),
datetime.datetime(2009, 3, 29, 3, 45),
datetime.datetime(2009, 3, 29, 3, 50),
datetime.datetime(2009, 3, 29, 3, 55),
datetime.datetime(2010, 3, 28, 3, 0),
datetime.datetime(2010, 3, 28, 3, 5),
datetime.datetime(2010, 3, 28, 3, 10),
datetime.datetime(2010, 3, 28, 3, 15),
datetime.datetime(2010, 3, 28, 3, 20),
datetime.datetime(2010, 3, 28, 3, 25),
datetime.datetime(2010, 3, 28, 3, 30),
datetime.datetime(2010, 3, 28, 3, 35),
datetime.datetime(2010, 3, 28, 3, 40),
datetime.datetime(2010, 3, 28, 3, 45),
datetime.datetime(2010, 3, 28, 3, 50),
datetime.datetime(2010, 3, 28, 3, 55),
datetime.datetime(2011, 3, 27, 3, 0),
datetime.datetime(2011, 3, 27, 3, 5),
datetime.datetime(2011, 3, 27, 3, 10),
datetime.datetime(2011, 3, 27, 3, 15),
datetime.datetime(2011, 3, 27, 3, 20),
datetime.datetime(2011, 3, 27, 3, 25),
datetime.datetime(2011, 3, 27, 3, 30),
datetime.datetime(2011, 3, 27, 3, 35),
datetime.datetime(2011, 3, 27, 3, 40),
datetime.datetime(2011, 3, 27, 3, 45),
datetime.datetime(2011, 3, 27, 3, 50),
datetime.datetime(2011, 3, 27, 3, 55)]
I get the same duplicates with other time series os the same kind and the duplicates are in march for every dataframe.