Left Join with timedelta64 does not produce correct nulls · Issue #5695 · pandas-dev/pandas (original) (raw)
related example: http://stackoverflow.com/questions/20789976/python-pandas-dataframe-1st-line-issue-with-datetime-timedelta/20802902?noredirect=1#comment31195305_20802902
import datetime
import pandas as pd
parms = {'d': datetime.datetime(2013, 11, 5, 5, 56), 't':datetime.timedelta(0, 22500)}
df = pd.DataFrame(columns=list('dt'))
df = df.append(parms, ignore_index=True)
erroneous code:
>>> df.append(parms, ignore_index=True)
d t
0 2013-11-05 05:56:00 22500000000000
1 2013-11-05 05:56:00 6:15:00
The notion of nullness is not handled well for timedelta64
columns when performing a left join:
In [194]:
pd.DataFrame(pd.Series([np.timedelta64(300000000),np.timedelta64(300000000)],dtype='m8[ns]',index=["A","B"])).join(
pd.DataFrame(pd.Series([np.timedelta64(300000000)],dtype='m8[ns]',index=["A"])),rsuffix='r', how="left").info()
Out [194]:
<class 'pandas.core.frame.DataFrame'>
Index: 2 entries, A to B
Data columns (total 2 columns):
0 2 non-null values
0r 1 non-null values
dtypes: float64(1), timedelta64[ns](1)
The column with a mix of timedelta64 and nulls
gets cast to a float64
.
This seems incorrect since NaT
should be usable to indicate the null:
In [196]:
pd.Series([np.timedelta64(300000000), pd.NaT],dtype='m8[ns]')
Out[196]:
0 00:00:00.300000
1 NaT
dtype: timedelta64[ns]