duplicated time series index entries after hdf5 store (dayligth saving time related) · Issue #1081 · pandas-dev/pandas (original) (raw)

>> df
<class 'pandas.core.frame.DataFrame'>
Index: 596520 entries, 2006-04-14 00:00:00 to 2011-12-31 23:55:00
Data columns:
g_m_pyr__0       596520  non-null values
e_wr__0          596520  non-null values
p_nenn_sg__0     596520  non-null values
flaeche_sg__0    596520  non-null values
dtypes: float64(4)
>> df.index[0].timetuple()
time.struct_time(tm_year=2006, tm_mon=4, tm_mday=14, tm_hour=0, tm_min=0, tm_sec=0, tm_wday=4, tm_yday=104, tm_isdst=-1)
>> df.index.get_duplicates()
[]

saving this dataframe to hdf5 and reloading it results in duplicated index entries:

>> store = pd.io.pytables.HDFStore('store.h5')
>> store['df'] = df 
>> df2 = store['df']
>> df2.index.get_duplicates()
[datetime.datetime(2007, 3, 25, 3, 0),
 datetime.datetime(2007, 3, 25, 3, 5),
 datetime.datetime(2007, 3, 25, 3, 10),
 datetime.datetime(2007, 3, 25, 3, 15),
 datetime.datetime(2007, 3, 25, 3, 20),
 datetime.datetime(2007, 3, 25, 3, 25),
 datetime.datetime(2007, 3, 25, 3, 30),
 datetime.datetime(2007, 3, 25, 3, 35),
 datetime.datetime(2007, 3, 25, 3, 40),
 datetime.datetime(2007, 3, 25, 3, 45),
 datetime.datetime(2007, 3, 25, 3, 50),
 datetime.datetime(2007, 3, 25, 3, 55),
 datetime.datetime(2008, 3, 30, 3, 0),
 datetime.datetime(2008, 3, 30, 3, 5),
 datetime.datetime(2008, 3, 30, 3, 10),
 datetime.datetime(2008, 3, 30, 3, 15),
 datetime.datetime(2008, 3, 30, 3, 20),
 datetime.datetime(2008, 3, 30, 3, 25),
 datetime.datetime(2008, 3, 30, 3, 30),
 datetime.datetime(2008, 3, 30, 3, 35),
 datetime.datetime(2008, 3, 30, 3, 40),
 datetime.datetime(2008, 3, 30, 3, 45),
 datetime.datetime(2008, 3, 30, 3, 50),
 datetime.datetime(2008, 3, 30, 3, 55),
 datetime.datetime(2009, 3, 29, 3, 0),
 datetime.datetime(2009, 3, 29, 3, 5),
 datetime.datetime(2009, 3, 29, 3, 10),
 datetime.datetime(2009, 3, 29, 3, 15),
 datetime.datetime(2009, 3, 29, 3, 20),
 datetime.datetime(2009, 3, 29, 3, 25),
 datetime.datetime(2009, 3, 29, 3, 30),
 datetime.datetime(2009, 3, 29, 3, 35),
 datetime.datetime(2009, 3, 29, 3, 40),
 datetime.datetime(2009, 3, 29, 3, 45),
 datetime.datetime(2009, 3, 29, 3, 50),
 datetime.datetime(2009, 3, 29, 3, 55),
 datetime.datetime(2010, 3, 28, 3, 0),
 datetime.datetime(2010, 3, 28, 3, 5),
 datetime.datetime(2010, 3, 28, 3, 10),
 datetime.datetime(2010, 3, 28, 3, 15),
 datetime.datetime(2010, 3, 28, 3, 20),
 datetime.datetime(2010, 3, 28, 3, 25),
 datetime.datetime(2010, 3, 28, 3, 30),
 datetime.datetime(2010, 3, 28, 3, 35),
 datetime.datetime(2010, 3, 28, 3, 40),
 datetime.datetime(2010, 3, 28, 3, 45),
 datetime.datetime(2010, 3, 28, 3, 50),
 datetime.datetime(2010, 3, 28, 3, 55),
 datetime.datetime(2011, 3, 27, 3, 0),
 datetime.datetime(2011, 3, 27, 3, 5),
 datetime.datetime(2011, 3, 27, 3, 10),
 datetime.datetime(2011, 3, 27, 3, 15),
 datetime.datetime(2011, 3, 27, 3, 20),
 datetime.datetime(2011, 3, 27, 3, 25),
 datetime.datetime(2011, 3, 27, 3, 30),
 datetime.datetime(2011, 3, 27, 3, 35),
 datetime.datetime(2011, 3, 27, 3, 40),
 datetime.datetime(2011, 3, 27, 3, 45),
 datetime.datetime(2011, 3, 27, 3, 50),
 datetime.datetime(2011, 3, 27, 3, 55)]

I get the same duplicates with other time series os the same kind and the duplicates are in march for every dataframe.