Left Join with timedelta64 does not produce correct nulls · Issue #5695 · pandas-dev/pandas (original) (raw)

related example: http://stackoverflow.com/questions/20789976/python-pandas-dataframe-1st-line-issue-with-datetime-timedelta/20802902?noredirect=1#comment31195305_20802902

import datetime
import pandas as pd
parms = {'d':  datetime.datetime(2013, 11, 5, 5, 56), 't':datetime.timedelta(0, 22500)}
df = pd.DataFrame(columns=list('dt'))
df = df.append(parms, ignore_index=True)
erroneous code:
>>> df.append(parms, ignore_index=True)
                    d               t
0 2013-11-05 05:56:00  22500000000000
1 2013-11-05 05:56:00         6:15:00

The notion of nullness is not handled well for timedelta64 columns when performing a left join:

In [194]:
pd.DataFrame(pd.Series([np.timedelta64(300000000),np.timedelta64(300000000)],dtype='m8[ns]',index=["A","B"])).join(
     pd.DataFrame(pd.Series([np.timedelta64(300000000)],dtype='m8[ns]',index=["A"])),rsuffix='r', how="left").info()

Out [194]:
<class 'pandas.core.frame.DataFrame'>
Index: 2 entries, A to B
Data columns (total 2 columns):
0     2  non-null values
0r    1  non-null values
dtypes: float64(1), timedelta64[ns](1)

The column with a mix of timedelta64 and nulls gets cast to a float64.

This seems incorrect since NaT should be usable to indicate the null:

In [196]:
pd.Series([np.timedelta64(300000000), pd.NaT],dtype='m8[ns]')

Out[196]:
0   00:00:00.300000
1               NaT
dtype: timedelta64[ns]